AI QA
Write tests in English. Run them in real browsers. On PR merge, AI analyses the blast radius, generates new tests, selects a regression suite, and smoke-tests the deployment. Failures trigger Bug Boomerang automatically.
What is AI QA?
Every QA tool asks you to think like a machine. Write selectors. Script workflows. Maintain page objects. And when the UI changes — and it always changes — your tests break. Not because the feature is broken, but because a button moved three pixels or a class name got refactored.
AI QA throws all of that away. You describe what to test in plain English. An AI agent opens a real browser, navigates the app, and tells you whether it works. No selectors. No scripts. No maintenance.
But that's not even the main event.
The real pitch
Merge a PR. AI reads the code changes, analyses the blast radius, generates targeted tests, pulls relevant regression tests from your registry, waits for the deploy to propagate, and smoke-tests everything in real browsers. Failures create Linear issues that trigger Bug Boomerang — autonomous fix pipelines. From PR merge to tested deployment to auto-fixed bugs. Untouched by human hands.
The PR Automation Pipeline
This is the centrepiece. Everything else supports it.
How the Suite Selection Works
This is the part most tools skip entirely. When a PR merges, the AI doesn't just generate new tests — it pulls from your entire test registry to build a smart regression suite:
| Selection criteria | What it means |
|---|---|
| Directly affected | Tests covering features touched by the changed files |
| Regression risk | Tests for adjacent or dependent features — if checkout changed, test payment too |
| Previously failing | Known problem areas that deserve re-verification |
| Coverage breadth | Diversity over redundancy — different pages beat duplicate coverage |
The AI explains its reasoning for every selection. Not a black box — you can see why each test was included. This reasoning is saved with every automation run in the Automations tab alongside pass/fail counts, PR links, and session recordings.
Workspaces and Domains
Tests live in projects — workspaces where teams organise tests by domain. Each project has its own test registry, test accounts, groups, and automation history.
Within a project, tests are organised into groups by domain: auth, billing, navigation, forms, etc. When the AI analyses a PR's blast radius, it maps changed files to these domains to figure out what to test.
This means your team contributes tests across domains, and the automation pipeline knows which ones to pull when code in a specific area changes.
Three Ways to Create Tests
A test case is three fields:
- Title: "Coach creates a nutrition plan"
- Description: "Log in as a coach, navigate to nutrition, create a new plan with 3 meals, save it"
- Expected outcome: "Plan appears in the list with correct meal count"
No code. No locators. The description is the literal instruction the browser agent follows — step-by-step actions a human would take.
Happens automatically when a PR merges (if automation is enabled). The AI:
- Pulls changed files from the PR, filters for frontend code
- Identifies affected components and domains
- Generates 3–5 targeted test cases with step-by-step browser instructions
- Auto-publishes them to the test registry
- Combines them with existing tests selected by blast radius analysis
- Executes the full suite after a 10-minute deploy propagation delay
Everything lands in the Automations tab with full justification.
Describe what you want covered in a sentence:
"As a coach, I can manage my client roster — adding, editing, and removing clients."
An AI exploration agent opens the app in a real browser, navigates the relevant area, observes what's there, and generates 3–10 structured test cases. You review the drafts, publish the ones you want.
Generated tests go through Jaccard similarity deduplication — high overlap with existing tests gets auto-skipped, moderate overlap gets flagged, genuinely new tests pass through. You don't end up with fifteen variations of "user can log in."
Account-Aware Parallel Execution
Real applications have authentication. Most AI QA tools either ignore this or corrupt state by sharing sessions across parallel tests.
Persistent sessions. Each test account logs in once. Cookies and session state get saved to a persistent profile per browser provider. Every future test reuses that profile — no re-authentication, no login race conditions.
Account locking. The scheduler ensures no two concurrent tests use the same account. If an account is busy, the test waits. If any will do, it round-robins through the pool, preferring accounts with authenticated profiles.
Parallel limits. From 1 (sequential) to 250 concurrent browsers. The scheduler respects both the parallel limit and account uniqueness simultaneously. No deadlocks. No corrupted sessions.
Browser Providers Are Pluggable
The browser execution layer is a clean adapter pattern. Three providers ship today:
| Provider | Runtime | Notes |
|---|---|---|
| Hyperbrowser Browser-Use | Gemini-powered agent | Stealth mode + proxy support |
| Hyperbrowser HyperAgent | Alternative agent runtime | Same infrastructure |
| BrowserUse Cloud | Independent provider | Own model stack |
Each implements the same interface: take a task description, return a structured verdict. Adding a new provider means writing one adapter file. Switch providers per-project from settings — no code changes.
Live View and Session Recording
While tests run, you get a live execution grid — real-time SSE status updates with embedded live views of each browser session. Watch the AI navigate your app in real time.
After completion, every session has a permanent recording URL — not the ephemeral live link. Share with your team, attach to bug reports, review days later. When a test fails, click the recording, watch what happened, know immediately if it's a real bug.
From Test Failure to Autonomous Fix
Test fails
A browser agent runs your test and returns FAIL with a structured reason and a session recording.
One-click bug report
AI reads the test case, failure result, and project context. Generates a structured bug report: title, severity, steps to reproduce, expected vs actual behaviour.
Linear issue created
Review the report, edit if needed, and create a Linear issue directly from the dashboard. The issue links back to the test result and recording.
Bug Boomerang picks it up
If you're running Bug Boomerang, that Linear issue triggers the autonomous fix pipeline — discovery, coding, self-correcting review loop, sandbox testing, draft PR. The bug goes from failed QA test to draft fix without a human writing a single line of code.
The full loop
PR merge → blast radius analysis → generate tests → select regression suite → execute in real browsers → test fails → bug report → Linear issue → Bug Boomerang → discovery → fix → review → sandbox test → draft PR. From deploy to fix, zero human intervention.
Automation Configuration
The automation pipeline is configured per-workspace in the Settings panel:
| Setting | What it does |
|---|---|
| Enable/disable | Master toggle for PR-triggered automation |
| Target project | Which workspace to create tests in and run against |
| Test count | Total tests per run (1–20), includes new + selected existing |
| Allowed GitHub usernames | Only PRs from these authors trigger automation (empty = all) |
| Branch patterns | Only PRs merged into matching branches trigger (supports * wildcards) |
What Makes This Different
Intent, not selectors
Describe behaviour in English. The AI figures out how to verify it. When the UI changes, the AI adapts — because it reads the page like a human, not like a CSS selector.
Blast radius + registry
AI doesn't just generate tests — it analyses changed code, maps to domains, and pulls existing regression tests from your registry. Smart selection with full justification.
Closes the entire loop
Generate → select → execute → report → file bugs → Bug Boomerang fixes them. Most tools stop at generation. This one goes from PR merge to autonomous fix.
Real browsers, real auth
Parallel sessions with persistent auth profiles, account locking, live view, and permanent recordings. Up to 250 concurrent browsers.
Tech Stack
| Component | Technology |
|---|---|
| App | Next.js 16 (App Router) + React 19 |
| AI | Vercel AI SDK + OpenRouter (GPT-5.2, Claude, Gemini) |
| Database | Neon Postgres + Drizzle ORM |
| Auth | Clerk (domain-gated team auth) |
| UI | shadcn/ui + Tailwind CSS 4 |
| Streaming | Server-Sent Events for live execution grid |
| Security | AES-256-GCM encrypted provider keys at rest |
| Bug tracking | Linear integration (one-click issue creation) |
| Browser providers | Hyperbrowser Browser-Use, Hyperbrowser HyperAgent, BrowserUse Cloud |
How It Connects
AI QA is the testing layer in the automation pipeline:
- Bug Boomerang — failed tests create Linear issues that trigger autonomous fix pipelines. Test → bug → fix → PR.
- Xena — the autonomous operator. Same event-driven webhook architecture.
- Browser provider adapters share the same interface pattern as Bug Boomerang's sandbox testing.
References
- Hyperbrowser — AI browser infrastructure: hyperbrowser.ai
- Browser Use — Cloud browser automation: browser-use.com
- Vercel AI SDK — AI integration for Next.js: sdk.vercel.ai
- OpenRouter — Multi-provider LLM routing: openrouter.ai
- Drizzle ORM — TypeScript-first database toolkit: orm.drizzle.team
Bug Boomerang
An autonomous pipeline that takes a Linear bug ticket and turns it into a tested, reviewed draft PR — with zero human intervention. Label a ticket, get a PR.
Particle Testing Framework
A graph-based test orchestration system that decomposes every testable action into atomic particles, arranges them in a dependency graph, and runs tests in parallel against ephemeral database snapshots.