Can Coding Agents Do QA With a Browser?

Britannio JarrettNovember 27, 2025

One of the best lessons I've learned from working with AI coding agents for web development is that they are more effective when given the right tools to complete a task. I have a browser open, Chrome DevTools is one shortcut away, I can see dev server logs. I certainly don't lose consciousness every time I run a blocking command outside of tmux.

As for coding agents, most of them don't have this luxury out of the box. They can't see the outcome of changes they made. And they can't tell if they've introduced a bug. I've been using Amp but my observations generally carry over to Claude Code, OpenAI Codex, and other coding agents.

The promise of browser augmentation

In principle, connecting coding agents to a browser expands the scope of work that they can reliably conduct.

Let's take fixing a bug. Even today, developers start by reproducing it manually, sharing the reproduction steps with a coding agent, then letting it implement a fix. The onus is then on you to verify if the solution worked or not - coding agents have a nasty habit of declaring victory before the battle is won. The most reliable antidote is to give coding agents the ability to check their own work, be that with tests, or with browser access.

Notice that the agent is now performing the looping behaviour without human intervention and thus it has a much higher chance of presenting you with a working solution!

Agent browser access is growing in popularity. Devin, Cursor and most recently Google's Antigravity IDE are all capable of controlling a browser out of the box. Additionally, Amp has a new sub-agent in the works that saves a video playback of what occurred within the browser.

In the meantime, you'd hope that you could add playwright-mcp to Amp, Claude Code, Codex, etc, and call it a day. Sadly, it's not that simple. But here are five tips to help you get the most out of browser access.

1. Tell them what to do in your AGENTS.md

Amp won't necessarily go out of its way to use playwright-mcp unless explicitly instructed to. Fortunately, this can be solved by updating your AGENTS.md. Just add:

Use dev-manager-mcp to run web dev servers.
Use playwright to navigate web applications.

I also have multi-step prompts that mention the explicit tools to use (browser_console_messages, browser_network_requests) during specific scenarios. I save these prompts as task tags in Vibe Kanban for easy access.

2. Use YOLO mode

One way to think about coding agents is that they are brute force tools for finding solutions to coding problems. If you can reduce your problem to a clear goal and a set of tools that can iterate towards that goal a coding agent can often brute force its way to an effective solution. - https://simonwillison.net/2025/Sep/30/designing-agentic-loops/

Yes, I know this sounds reckless but with the right mitigations, you can go from micromanaging a single coding agent to running one or more in the background on your local machine or in a remote VM.

I find this most useful for debugging full stack applications. I use a multi-step prompt that analyses the codebase, instruments it with logs, exercises the relevant parts of the application, and uses the browser_console_messages tool from playwright-mcp and the tail tool from dev-manager-mcp to monitor application behaviour and precisely hone in on the issue.

This process often involves 50+ tool calls and a lot of time. By default, I would need to approve these tools manually, but fortunately Amp gives you granular control over tool permissions. Vibe Kanban uses the --dangerously-allow-all flag to save you from manual permission tuning.

3. Manage multiple dev servers

Before we can open a web app, we need to start the dev server. However if Amp naively does this, it will get stuck as my dev server command starts a long running process that Amp will patiently wait for, to no avail.

My initial solution to this was to instruct Amp to manage dev servers via tmux. This has a few unexpected benefits:

It becomes easier to keep track of dev servers that have been started which is especially useful if you run coding agents in parallel.
Amp can periodically access dev server logs by 'tail'ing the tmux window that it created.
I can manually attach to the tmux session and view the logs if I need to intervene.

I now default to dev-manager-mcp as it automatically handles unique port allocation, dev server log access via the tail tool, and it cleans up idle dev servers after a configurable timeout to prevent your machine being riddled with ghost processes.

With either of these approaches described in AGENTS.md or in a prompt, Amp is successfully able to manage dev servers.

4. Use sub-agents to manage context

The common wisdom amongst coding agent enthusiasts is to use a single thread per task, but this isn't enough once you bring browser access into the mix. It risks ingesting a dizzying amount of logs, accessibility tree snapshots, and screenshots that can quickly pollute a thread, and cause costs to skyrocket. I recently spent $103 in a single coding agent thread that used 91 playwright-mcp invocations over 14 turns.

Sub-agents can help. Sub-agents are ephemeral agents given a single prompt. Only their final output is visible to the outermost agent. In general, lots of small and targeted agent executions are more effective than a single big execution.

A long thread is less effective than a short thread leveraging sub agents with their own threads.

5. Background your agents

This final challenge isn't specific to browser access, but browser access certainly makes it worse. It takes a long time to spin up a dev server, navigate to the relevant page, interact with the UI, read logs, occasionally sleep for a fixed number of seconds, observe network requests, read screenshots, consult the oracle, make changes, and repeat.

The solution is to let the agent run in the background. Configure its browser to be headless so that it doesn't distract you with a myriad of Chrome windows. Use git worktrees to give it a clean checkout of your codebase that doesn't interfere with you or your other agent instances. With this combination, you can 'set and forget', working on something else in the meantime.

Since I use Vibe Kanban, git worktrees are automatically created and cleaned up once a task is complete. It defaults to running agents in their least restrictive mode, although this is customisable.

Are we there yet?

After overcoming these five hurdles, my coding agent:

can reliably manage dev servers via the dev-manager-mcp
has agent rules to encourage it to use a browser
is allowed to use all tools without approval
uses sub agents to effectively manage context when using the browser
has its own isolated workspace to work for long periods without interrupting me

Have I unlocked autonomous QA? No, not yet. But as models improve we're getting there.

Using Playwright MCP to exercise features of a basic web app.

This is my MCP server configuration if you would like to replicate my setup:

{
  "dev-manager": {
    "type": "stdio",
    "command": "npx",
    "args": ["dev-manager-mcp", "stdio"],
    "env": {}
  },
  "playwright": {
    "command": "npx",
    "args": ["@playwright/mcp@latest", "--isolated", "--headless"]
  }
}

amp mcp add playwright -- npx @playwright/mcp@latest --isolated --headless
amp mcp add dev-manager -- npx dev-manager-mcp stdio

claude mcp add playwright npx @playwright/mcp@latest --isolated --headless
claude mcp add dev-manager npx dev-manager-mcp stdio

codex mcp add playwright npx "@playwright/mcp@latest" --isolated --headless
codex mcp add dev-manager npx "dev-manager-mcp" "stdio"

(Note that the dev-manager-mcp requires you to run npx -y dev-manager-mcp before starting your coding agent.)