Building Agents That Actually Ship

Write tests in English. Run them in real browsers. On PR merge, AI analyses the blast radius, generates new tests, selects a regression suite, and smoke-tests the deployment. Failures trigger Bug Boomerang automatically.

What is AI QA?

Every QA tool asks you to think like a machine. Write selectors. Script workflows. Maintain page objects. And when the UI changes — and it always changes — your tests break. Not because the feature is broken, but because a button moved three pixels or a class name got refactored.

AI QA throws all of that away. You describe what to test in plain English. An AI agent opens a real browser, navigates the app, and tells you whether it works. No selectors. No scripts. No maintenance.

But that's not even the main event.

The real pitch

Merge a PR. AI reads the code changes, analyses the blast radius, generates targeted tests, pulls relevant regression tests from your registry, waits for the deploy to propagate, and smoke-tests everything in real browsers. Failures create Linear issues that trigger Bug Boomerang — autonomous fix pipelines. From PR merge to tested deployment to auto-fixed bugs. Untouched by human hands.

The PR Automation Pipeline

This is the centrepiece. Everything else supports it.

How the Suite Selection Works

This is the part most tools skip entirely. When a PR merges, the AI doesn't just generate new tests — it pulls from your entire test registry to build a smart regression suite:

Selection criteria	What it means
Directly affected	Tests covering features touched by the changed files
Regression risk	Tests for adjacent or dependent features — if checkout changed, test payment too
Previously failing	Known problem areas that deserve re-verification
Coverage breadth	Diversity over redundancy — different pages beat duplicate coverage

The AI explains its reasoning for every selection. Not a black box — you can see why each test was included. This reasoning is saved with every automation run in the Automations tab alongside pass/fail counts, PR links, and session recordings.

Workspaces and Domains

Tests live in projects — workspaces where teams organise tests by domain. Each project has its own test registry, test accounts, groups, and automation history.

Within a project, tests are organised into groups by domain: auth, billing, navigation, forms, etc. When the AI analyses a PR's blast radius, it maps changed files to these domains to figure out what to test.

This means your team contributes tests across domains, and the automation pipeline knows which ones to pull when code in a specific area changes.

Three Ways to Create Tests

A test case is three fields:

Title: "Coach creates a nutrition plan"
Description: "Log in as a coach, navigate to nutrition, create a new plan with 3 meals, save it"
Expected outcome: "Plan appears in the list with correct meal count"

No code. No locators. The description is the literal instruction the browser agent follows — step-by-step actions a human would take.

Happens automatically when a PR merges (if automation is enabled). The AI:

Pulls changed files from the PR, filters for frontend code
Identifies affected components and domains
Generates 3–5 targeted test cases with step-by-step browser instructions
Auto-publishes them to the test registry
Combines them with existing tests selected by blast radius analysis
Executes the full suite after a 10-minute deploy propagation delay

Everything lands in the Automations tab with full justification.

Describe what you want covered in a sentence:

"As a coach, I can manage my client roster — adding, editing, and removing clients."

An AI exploration agent opens the app in a real browser, navigates the relevant area, observes what's there, and generates 3–10 structured test cases. You review the drafts, publish the ones you want.

Generated tests go through Jaccard similarity deduplication — high overlap with existing tests gets auto-skipped, moderate overlap gets flagged, genuinely new tests pass through. You don't end up with fifteen variations of "user can log in."

Account-Aware Parallel Execution

Real applications have authentication. Most AI QA tools either ignore this or corrupt state by sharing sessions across parallel tests.

Persistent sessions. Each test account logs in once. Cookies and session state get saved to a persistent profile per browser provider. Every future test reuses that profile — no re-authentication, no login race conditions.

Account locking. The scheduler ensures no two concurrent tests use the same account. If an account is busy, the test waits. If any will do, it round-robins through the pool, preferring accounts with authenticated profiles.

Parallel limits. From 1 (sequential) to 250 concurrent browsers. The scheduler respects both the parallel limit and account uniqueness simultaneously. No deadlocks. No corrupted sessions.

Browser Providers Are Pluggable

The browser execution layer is a clean adapter pattern. Three providers ship today:

Provider	Runtime	Notes
Hyperbrowser Browser-Use	Gemini-powered agent	Stealth mode + proxy support
Hyperbrowser HyperAgent	Alternative agent runtime	Same infrastructure
BrowserUse Cloud	Independent provider	Own model stack

Each implements the same interface: take a task description, return a structured verdict. Adding a new provider means writing one adapter file. Switch providers per-project from settings — no code changes.

Live View and Session Recording

While tests run, you get a live execution grid — real-time SSE status updates with embedded live views of each browser session. Watch the AI navigate your app in real time.

After completion, every session has a permanent recording URL — not the ephemeral live link. Share with your team, attach to bug reports, review days later. When a test fails, click the recording, watch what happened, know immediately if it's a real bug.

From Test Failure to Autonomous Fix

Test fails

A browser agent runs your test and returns FAIL with a structured reason and a session recording.

One-click bug report

AI reads the test case, failure result, and project context. Generates a structured bug report: title, severity, steps to reproduce, expected vs actual behaviour.

Linear issue created

Review the report, edit if needed, and create a Linear issue directly from the dashboard. The issue links back to the test result and recording.

Bug Boomerang picks it up

If you're running Bug Boomerang, that Linear issue triggers the autonomous fix pipeline — discovery, coding, self-correcting review loop, sandbox testing, draft PR. The bug goes from failed QA test to draft fix without a human writing a single line of code.

The full loop

PR merge → blast radius analysis → generate tests → select regression suite → execute in real browsers → test fails → bug report → Linear issue → Bug Boomerang → discovery → fix → review → sandbox test → draft PR. From deploy to fix, zero human intervention.

Automation Configuration

The automation pipeline is configured per-workspace in the Settings panel:

Setting	What it does
Enable/disable	Master toggle for PR-triggered automation
Target project	Which workspace to create tests in and run against
Test count	Total tests per run (1–20), includes new + selected existing
Allowed GitHub usernames	Only PRs from these authors trigger automation (empty = all)
Branch patterns	Only PRs merged into matching branches trigger (supports `*` wildcards)

What Makes This Different

Intent, not selectors

Describe behaviour in English. The AI figures out how to verify it. When the UI changes, the AI adapts — because it reads the page like a human, not like a CSS selector.

Blast radius + registry

AI doesn't just generate tests — it analyses changed code, maps to domains, and pulls existing regression tests from your registry. Smart selection with full justification.

Closes the entire loop

Generate → select → execute → report → file bugs → Bug Boomerang fixes them. Most tools stop at generation. This one goes from PR merge to autonomous fix.

Real browsers, real auth

Parallel sessions with persistent auth profiles, account locking, live view, and permanent recordings. Up to 250 concurrent browsers.

Tech Stack

Component	Technology
App	Next.js 16 (App Router) + React 19
AI	Vercel AI SDK + OpenRouter (GPT-5.2, Claude, Gemini)
Database	Neon Postgres + Drizzle ORM
Auth	Clerk (domain-gated team auth)
UI	shadcn/ui + Tailwind CSS 4
Streaming	Server-Sent Events for live execution grid
Security	AES-256-GCM encrypted provider keys at rest
Bug tracking	Linear integration (one-click issue creation)
Browser providers	Hyperbrowser Browser-Use, Hyperbrowser HyperAgent, BrowserUse Cloud

How It Connects

AI QA is the testing layer in the automation pipeline:

Bug Boomerang — failed tests create Linear issues that trigger autonomous fix pipelines. Test → bug → fix → PR.
Xena — the autonomous operator. Same event-driven webhook architecture.
Browser provider adapters share the same interface pattern as Bug Boomerang's sandbox testing.

References

Hyperbrowser — AI browser infrastructure: hyperbrowser.ai
Browser Use — Cloud browser automation: browser-use.com
Vercel AI SDK — AI integration for Next.js: sdk.vercel.ai
OpenRouter — Multi-provider LLM routing: openrouter.ai
Drizzle ORM — TypeScript-first database toolkit: orm.drizzle.team

AI QA

Intent, not selectors

Blast radius + registry

Closes the entire loop

Real browsers, real auth

On this page