moustache: Browser Automation for Coding Agents

May 27, 20265 min read

A small browser automation CLI for coding agents, built as a learning vehicle for understanding how agents should see and use the web.

I built moustache, a small browser automation CLI for coding agents.

It is a single Go binary that drives Chrome or Chromium through the Chrome DevTools Protocol. You can open a page, take an accessibility snapshot, click an element, fill a field, wait for text, read state, capture a screenshot, and keep doing that across commands without starting a fresh browser every time.

The short version:

moustache open example.com
moustache snapshot -i
moustache click @e1
moustache fill @e3 "test@example.com"
moustache get text @e1
moustache screenshot
moustache close

It is not meant to be a new browser testing framework. It is meant to be a small, predictable tool that an agent can call from a terminal.

What it does

Drives Chrome / Chromium. moustache talks to the browser through CDP.
Runs as one static binary. No Node runtime, no Python environment, no package-manager ceremony.
Keeps a daemon alive. The first command starts a small background daemon; later commands reuse the same browser session.
Uses element refs. snapshot produces refs like @e1, @e2, and @e3, so an agent can act on what it just saw instead of guessing selectors.
Supports semantic locators. Find by role, text, label, placeholder, alt text, title, test id, or plain CSS / XPath.
Returns boring output. Plain text by default, structured --json when a harness wants machine-readable results.
Works across sessions. --session lets multiple agents or tasks keep isolated browser state.

The design goal is not glamour. It is boring reliability.

Why I built it

This started inside my broader work on agent tooling.

I have been building and using coding agents heavily: Claude Code, Codex, OpenCode, pi, and my own terminal harness, zot. The more I used these tools, the more obvious it became that the browser is still awkward territory for agents.

A coding agent can edit files, run tests, inspect logs, and call APIs. But when it needs to verify a UI, debug a login flow, check a button, or read what a real page rendered, things get clumsy. Screenshots are useful, but imprecise. Raw HTML is useful, but often too far away from what a user experiences. Playwright is excellent, but it is a whole framework, and agents do not always need a test suite. Sometimes they need a handle.

I wanted something smaller:

Open browser.
Ask page what interactive things exist.
Get stable refs.
Click, type, wait, read.
Repeat.

That loop felt like the missing primitive.

moustache as a learning vehicle

The honest reason is that moustache is a learning vehicle.

I wanted to understand browser automation at the level below the friendly APIs: how Chrome DevTools Protocol behaves, how persistent browser contexts should be managed, how much state a CLI can safely hide in a daemon, and what kind of output an LLM can consume without getting confused.

Building it forced a bunch of concrete questions:

What should an agent see when it asks for the page?
Is the accessibility tree a better default than the DOM?
How stable can element references be across commands?
When should the browser stay alive, and when should it die?
What should be plain text, and what should be JSON?
How small can the command surface be before it becomes frustrating?
What belongs in the CLI versus the harness that calls it?

Those questions are hard to answer by reading docs. They become obvious only when you build the tool, wire it into an agent, and watch where the model makes mistakes.

That is the odyssey, really: I keep building smaller pieces of the agent stack because each one teaches me something different. zot taught me about harnesses, streaming, tools, prompts, cancellation, and the feel of an interactive runtime. moustache is the browser piece: how an agent should look at and manipulate a real UI.

None of these projects started as grand platforms. They started because I wanted to understand the material by touching it.

Why Go again?

moustache is written in Go for the same practical reason as zot: the install story matters.

A browser automation helper for agents should be easy to drop into any harness. One binary is a good shape for that. Download it, put it on PATH, call it from whatever agent you use.

Go also fits the daemon model nicely. There is a CLI process, a background process, sockets, timeouts, browser lifecycle management, concurrent command handling, and lots of boring IO. Go is good at that kind of boring.

Could this have been TypeScript? Absolutely. Most browser automation gravity is there, and Playwright is wonderful. But moustache is not trying to compete with Playwright. It is trying to be a tiny command-line bridge between an agent and a real browser.

For that, Go felt right.

What I learned

A few things stood out quickly:

The accessibility tree is a great agent interface. It strips away huge amounts of DOM noise and leaves something closer to what the agent actually needs: roles, names, and relationships.
Persistent sessions matter. Starting a browser for every command makes the workflow feel broken. A small daemon changes the experience completely.
Refs beat selector guessing. Let the agent inspect first, then act on a concrete @eN ref. It is less magical and more reliable.
Plain text is still underrated. Not every tool output needs to be a giant JSON object. Humans and models both benefit from small, deterministic text.
Small tools compose well. moustache does not need to know anything about zot, Claude Code, Codex, or any other harness. If a tool can shell out, it can use moustache.

The most important lesson is that agent tooling benefits from narrow, sharp interfaces. A tool does not have to be large to change what an agent can do. It just has to expose the right primitive.

Try it

moustache is open source on GitHub:

github.com/patriceckhart/moustache

Install it, point it at Chrome or Chromium, and let your agent drive a real browser without dragging in a whole framework.