Bug Boomerang
An autonomous pipeline that takes a Linear bug ticket and turns it into a tested, reviewed draft PR — with zero human intervention. Label a ticket, get a PR.
What is Bug Boomerang?
A bug gets filed. A developer reads it. They investigate the codebase. They figure out the root cause. They write a plan. They code a fix. They review their own work. They set up a test environment. They test the fix. They create a PR. They post an update on the ticket.
That's two hours minimum for a trivial bug. Half a day for anything non-trivial. And most of that time isn't thinking — it's scaffolding. Cloning branches, setting up environments, running audits, writing PR descriptions. Mechanical work that eats developer hours but requires just enough context to resist automation.
Bug Boomerang takes a Linear bug ticket and turns it into a tested, reviewed draft PR — with zero human intervention. Not "assists with." Not "suggests." It does the whole thing.
Label a ticket, get a PR
A bug label gets applied to a Linear issue. Bug Boomerang picks it up, investigates the codebase, identifies the root cause, writes a fix plan, codes the solution, reviews its own work in a self-correcting loop, provisions a production-like sandbox, validates the fix in a real browser, and creates a draft PR. The developer's first touch is reviewing a PR that already works. 15–30 minutes. A few dollars in compute.
The Pipeline
The Self-Correcting Code Loop
This is where it gets interesting. A coding agent writes the fix based on the plan. Then a separate review agent examines the patch with structured output: is the fix correct? Are there issues? Each finding has a priority, a title, and a detailed explanation.
If the reviewer says it's correct, the pipeline moves on. If it finds problems, the findings get fed back to the coder as context for the next iteration. The coder doesn't see its own previous attempt in isolation — it sees what the reviewer thought was wrong and addresses it specifically.
Between rounds, automated audit scripts check for rule violations and schema consistency. If the code breaks project conventions, that feedback enters the loop too.
Coder writes fix (from plan + prior review feedback)
↓
Audit scripts run (rules + schema checks)
↓
Review agent examines patch (structured JSON output)
↓
"patch is correct" → PASS → continue to sandbox
↓
Findings with priorities → FAIL → feed back to coder
↓
Repeat (up to 5 rounds)
↓
5 rounds exhausted → escalate to human with full contextThis isn't "generate code and hope." It's a feedback loop where two independent AI agents converge on a correct solution through structured critique.
The Sandbox: A Production-Like World in 60 Seconds
For bugs that touch the UI, Bug Boomerang provisions an ephemeral test environment from scratch:
Database isolation via Neon branching
The sandbox gets a private copy of the production database using Neon's copy-on-write branching. Not a mock. Not a seed script. A real database with real data, isolated so nothing the test does can touch production. The branch gets deleted when the test is done.
Full application stack via Vercel Sandbox
The fix branch gets deployed into a sandboxed Vercel environment — API server, web frontend, reverse proxy for same-origin auth, CDN schemas, the works. Environment variables are injected, dependencies installed, services started.
Auth health verification
Before any test runs, the sandbox validates itself: Can a user sign in? Does the session persist? Do the schemas load? Does the API respond? Only when every health check passes does testing begin.
Browser testing with a real browser
A Browser Use agent opens the sandbox in a real browser. It logs in, navigates to the affected area, and tries to reproduce the original bug. Not a unit test. A real browser clicking real buttons on a real application backed by a real database.
The output is structured: PASS or FAIL, with a failure type (code issue or environment issue), a summary, and specific findings.
FAIL → loop back
If the fix failed because of a code issue, the entire pipeline restarts: discovery runs again with the browser agent's feedback as additional context. New plan, new code, new review, new test. Up to 3 full restarts.
If the environment itself is broken — escalation. Bug Boomerang doesn't waste cycles debugging infrastructure.
This entire environment — database branch, application stack, auth validation — spins up automatically, exists for exactly as long as the test needs it, and tears itself down afterward. No developer provisioned anything. No DevOps ticket was filed.
Escalation: Knowing When to Stop
Bug Boomerang doesn't pretend it can fix everything. It has explicit stop conditions:
| Condition | What happens |
|---|---|
| Code review exhausted (5 rounds) | Stops, tags configured users on Linear, bumps priority to High |
| Browser agent crashed | Stops, escalates with full context of what it tried |
| Environment failure | Stops, distinguishes infra problems from code problems |
| Frontend validation failed 3 times | Stops, provides all browser feedback to the human |
When it escalates, the human gets everything: what was discovered, what was planned, what was coded, what the reviewer said, what the browser saw. Full context, not just "it didn't work."
And if a human posts a comment on the Linear issue while a run is active, Bug Boomerang treats it as new input. It supersedes the current run, starts fresh, and carries the human's feedback forward as additional context. The human doesn't need to learn a new interface — they just comment on the ticket like they normally would.
The Toolchain Is Pluggable
Every stage runs through a configurable toolchain. The agents, models, and providers are specified in config, not code:
toolchain:
discovery:
model: "claude-sonnet-4-5-20250929"
provider: "anthropic"
coder:
model: "claude-opus-4-6"
provider: "anthropic"
review:
model: "o3"
provider: "openai"
outputMode: "json"Want to A/B test GPT-5.3-Codex against Claude Opus? Change a config value. Every tool also supports environment variable overrides: XENA_CODER_BIN, XENA_CODER_ARGS_JSON, XENA_CODER_TIMEOUT_MS. The pipeline doesn't care which model writes the code — it cares that the review loop produces correct output.
State Survives Everything
Every run is persisted to disk as a deterministic state machine. Every stage transition is recorded. Every agent output is saved as an artifact:
.xena/
issues/{issueId}.json # Run state machine
artifacts/{issueId}/{runId}/ # Discovery, plan, review outputs
sandbox-meta/{sandboxId}.json # Environment metadata
events.json # Dedup window (7-day)If the server crashes mid-run, the state is there when it comes back. This isn't just crash recovery — it's auditability. You can trace exactly what happened at every stage of every run.
What Makes This Different
End-to-end, not assisted
From ticket to tested PR. Not a code suggestion tool. Not a co-pilot. The whole pipeline, autonomously.
Self-correcting code loop
Two independent AI agents (coder + reviewer) iterate through structured critique. Up to 5 rounds. Audit scripts between each.
Real browser, real database
Neon DB branches + Vercel Sandbox + Browser Use. Production-like testing with production-like data, isolated and ephemeral.
Honest about its limits
Explicit stop conditions. Full-context escalation. No silent failures. No half-finished branches quietly rotting.
Tech Stack
| Component | Technology |
|---|---|
| Runtime | Node.js (CommonJS) |
| HTTP | Express |
| Trigger | Linear webhooks (signature-verified) |
| Code Agents | Configurable — Claude, GPT, any provider |
| Review | Structured JSON output with per-finding priorities |
| Database Isolation | Neon copy-on-write branching |
| Sandbox | Vercel Sandbox SDK |
| Browser Testing | Browser Use cloud API |
| Git | gh CLI for PR operations |
| Notifications | Slack via GitHub webhook loop |
| State | Deterministic file-based state machine |
| Config | YAML + environment variable overrides |
How It Connects
Bug Boomerang shares DNA with the rest of the runtime family:
- Xena — the autonomous event operator. Bug Boomerang's webhook handling and state persistence patterns come from Xena's architecture.
- Noktua — the desktop agent. Same configurable toolchain philosophy — the framework is permanent, the components are variables.
- The coder/reviewer agents use the same pluggable model client pattern. When a better coding model ships, you change a string in config.
References
- Neon — Serverless Postgres with copy-on-write branching: neon.tech
- Vercel Sandbox — Ephemeral compute environments: Vercel docs
- Browser Use — Cloud browser automation: browser-use.com
- Linear — Issue tracking with webhook events: linear.app
Teddy
A pre-coding discovery agent that explores codebases like a dog digging for bones — relentlessly, systematically, and with zero quit. Produces structured context packages that give downstream coding agents first-attempt accuracy.
AI QA
Write tests in English. Run them in real browsers. On PR merge, AI analyses the blast radius, generates new tests, selects a regression suite, and smoke-tests the deployment. Failures trigger Bug Boomerang automatically.