nof0xgiven

AI QA

Write tests in English. Run them in real browsers. On PR merge, AI analyses the blast radius, generates new tests, selects a regression suite, and smoke-tests the deployment. Failures trigger Bug Boomerang automatically.

What is AI QA?

Every QA tool asks you to think like a machine. Write selectors. Script workflows. Maintain page objects. And when the UI changes — and it always changes — your tests break. Not because the feature is broken, but because a button moved three pixels or a class name got refactored.

AI QA throws all of that away. You describe what to test in plain English. An AI agent opens a real browser, navigates the app, and tells you whether it works. No selectors. No scripts. No maintenance.

But that's not even the main event.

The real pitch

Merge a PR. AI reads the code changes, analyses the blast radius, generates targeted tests, pulls relevant regression tests from your registry, waits for the deploy to propagate, and smoke-tests everything in real browsers. Failures create Linear issues that trigger Bug Boomerang — autonomous fix pipelines. From PR merge to tested deployment to auto-fixed bugs. Untouched by human hands.

The PR Automation Pipeline

This is the centrepiece. Everything else supports it.

1. PR MergedGitHub webhook fires2. Blast Radius AnalysisAI reads changed files + PR contextidentifies affected domains + components3. Generate New Tests3–5 targeted test cases from PR changesauto-published to test registry4. AI Selects Regression Suitenew tests from this PR + existing tests from registryselected by: directly affected · regression risk · previously failing · coverage breadthwith full justification saved to automations tab5. Wait 10 minutesdeploy propagation + cache clearing6. Execute in Real Browsersparallel sessions · account locking · live view · session recordingsup to 250 concurrent browsers✅ All Passautomations tab: results + reasoning❌ Test Fails → One-Click Bug ReportAI generates: title · severity · steps to reproduce · expected vs actual→ Linear issue → Bug Boomerang → autonomous fix → draft PR

How the Suite Selection Works

This is the part most tools skip entirely. When a PR merges, the AI doesn't just generate new tests — it pulls from your entire test registry to build a smart regression suite:

Selection criteriaWhat it means
Directly affectedTests covering features touched by the changed files
Regression riskTests for adjacent or dependent features — if checkout changed, test payment too
Previously failingKnown problem areas that deserve re-verification
Coverage breadthDiversity over redundancy — different pages beat duplicate coverage

The AI explains its reasoning for every selection. Not a black box — you can see why each test was included. This reasoning is saved with every automation run in the Automations tab alongside pass/fail counts, PR links, and session recordings.

Workspaces and Domains

Tests live in projects — workspaces where teams organise tests by domain. Each project has its own test registry, test accounts, groups, and automation history.

Within a project, tests are organised into groups by domain: auth, billing, navigation, forms, etc. When the AI analyses a PR's blast radius, it maps changed files to these domains to figure out what to test.

This means your team contributes tests across domains, and the automation pipeline knows which ones to pull when code in a specific area changes.

Three Ways to Create Tests

A test case is three fields:

  • Title: "Coach creates a nutrition plan"
  • Description: "Log in as a coach, navigate to nutrition, create a new plan with 3 meals, save it"
  • Expected outcome: "Plan appears in the list with correct meal count"

No code. No locators. The description is the literal instruction the browser agent follows — step-by-step actions a human would take.

Happens automatically when a PR merges (if automation is enabled). The AI:

  1. Pulls changed files from the PR, filters for frontend code
  2. Identifies affected components and domains
  3. Generates 3–5 targeted test cases with step-by-step browser instructions
  4. Auto-publishes them to the test registry
  5. Combines them with existing tests selected by blast radius analysis
  6. Executes the full suite after a 10-minute deploy propagation delay

Everything lands in the Automations tab with full justification.

Describe what you want covered in a sentence:

"As a coach, I can manage my client roster — adding, editing, and removing clients."

An AI exploration agent opens the app in a real browser, navigates the relevant area, observes what's there, and generates 3–10 structured test cases. You review the drafts, publish the ones you want.

Generated tests go through Jaccard similarity deduplication — high overlap with existing tests gets auto-skipped, moderate overlap gets flagged, genuinely new tests pass through. You don't end up with fifteen variations of "user can log in."

Account-Aware Parallel Execution

Real applications have authentication. Most AI QA tools either ignore this or corrupt state by sharing sessions across parallel tests.

Persistent sessions. Each test account logs in once. Cookies and session state get saved to a persistent profile per browser provider. Every future test reuses that profile — no re-authentication, no login race conditions.

Account locking. The scheduler ensures no two concurrent tests use the same account. If an account is busy, the test waits. If any will do, it round-robins through the pool, preferring accounts with authenticated profiles.

Parallel limits. From 1 (sequential) to 250 concurrent browsers. The scheduler respects both the parallel limit and account uniqueness simultaneously. No deadlocks. No corrupted sessions.

Browser Providers Are Pluggable

The browser execution layer is a clean adapter pattern. Three providers ship today:

ProviderRuntimeNotes
Hyperbrowser Browser-UseGemini-powered agentStealth mode + proxy support
Hyperbrowser HyperAgentAlternative agent runtimeSame infrastructure
BrowserUse CloudIndependent providerOwn model stack

Each implements the same interface: take a task description, return a structured verdict. Adding a new provider means writing one adapter file. Switch providers per-project from settings — no code changes.

Live View and Session Recording

While tests run, you get a live execution grid — real-time SSE status updates with embedded live views of each browser session. Watch the AI navigate your app in real time.

After completion, every session has a permanent recording URL — not the ephemeral live link. Share with your team, attach to bug reports, review days later. When a test fails, click the recording, watch what happened, know immediately if it's a real bug.

From Test Failure to Autonomous Fix

Test fails

A browser agent runs your test and returns FAIL with a structured reason and a session recording.

One-click bug report

AI reads the test case, failure result, and project context. Generates a structured bug report: title, severity, steps to reproduce, expected vs actual behaviour.

Linear issue created

Review the report, edit if needed, and create a Linear issue directly from the dashboard. The issue links back to the test result and recording.

Bug Boomerang picks it up

If you're running Bug Boomerang, that Linear issue triggers the autonomous fix pipeline — discovery, coding, self-correcting review loop, sandbox testing, draft PR. The bug goes from failed QA test to draft fix without a human writing a single line of code.

The full loop

PR merge → blast radius analysis → generate tests → select regression suite → execute in real browsers → test fails → bug report → Linear issue → Bug Boomerang → discovery → fix → review → sandbox test → draft PR. From deploy to fix, zero human intervention.

Automation Configuration

The automation pipeline is configured per-workspace in the Settings panel:

SettingWhat it does
Enable/disableMaster toggle for PR-triggered automation
Target projectWhich workspace to create tests in and run against
Test countTotal tests per run (1–20), includes new + selected existing
Allowed GitHub usernamesOnly PRs from these authors trigger automation (empty = all)
Branch patternsOnly PRs merged into matching branches trigger (supports * wildcards)

What Makes This Different

Intent, not selectors

Describe behaviour in English. The AI figures out how to verify it. When the UI changes, the AI adapts — because it reads the page like a human, not like a CSS selector.

Blast radius + registry

AI doesn't just generate tests — it analyses changed code, maps to domains, and pulls existing regression tests from your registry. Smart selection with full justification.

Closes the entire loop

Generate → select → execute → report → file bugs → Bug Boomerang fixes them. Most tools stop at generation. This one goes from PR merge to autonomous fix.

Real browsers, real auth

Parallel sessions with persistent auth profiles, account locking, live view, and permanent recordings. Up to 250 concurrent browsers.

Tech Stack

ComponentTechnology
AppNext.js 16 (App Router) + React 19
AIVercel AI SDK + OpenRouter (GPT-5.2, Claude, Gemini)
DatabaseNeon Postgres + Drizzle ORM
AuthClerk (domain-gated team auth)
UIshadcn/ui + Tailwind CSS 4
StreamingServer-Sent Events for live execution grid
SecurityAES-256-GCM encrypted provider keys at rest
Bug trackingLinear integration (one-click issue creation)
Browser providersHyperbrowser Browser-Use, Hyperbrowser HyperAgent, BrowserUse Cloud

How It Connects

AI QA is the testing layer in the automation pipeline:

  • Bug Boomerang — failed tests create Linear issues that trigger autonomous fix pipelines. Test → bug → fix → PR.
  • Xena — the autonomous operator. Same event-driven webhook architecture.
  • Browser provider adapters share the same interface pattern as Bug Boomerang's sandbox testing.

References

On this page