Building Agents That Actually Ship

A pre-coding discovery agent that explores codebases like a dog digging for bones — relentlessly, systematically, and with zero quit. Produces structured context packages that give downstream coding agents first-attempt accuracy.

What is Teddy?

The dominant failure mode for AI coding agents is not reasoning — it is context. An agent that receives a task description without understanding the codebase architecture, existing patterns, type definitions, and dependency constraints will generate plausible-looking code that does not fit. It invents interfaces that already exist under different names. It ignores project-wide conventions. It duplicates utility functions. The developer spends more time correcting the agent's output than they would have spent writing the code themselves.

The conventional fix is to hope the coding agent discovers what it needs mid-implementation. This fails predictably: the agent burns reasoning tokens on retrieval work, follows incomplete search chains, and commits to implementation decisions before understanding the constraints.

Teddy separates discovery from implementation. You run Teddy first. It explores the codebase — semantically, structurally, exhaustively — and produces a structured Context Package: a 25,000–30,000 token markdown document with everything a coding agent needs to implement correctly on the first attempt. Architecture overview. Relevant files with line numbers. Type definitions. Patterns to follow. Constraints. Gotchas. The agent starts with complete situational awareness, not a blank slate and a prayer.

Named after the founder's dog

Teddy is named after Mark's dog — a relentless digger who won't stop until he's found what he's looking for. The metaphor is the product: Teddy tears through a codebase the same way, following semantic chains from component to state management to theme system to type constraints, until it has a complete picture of what matters for your specific change.

How It Works

The Two-Phase Insight

This is the core architectural decision: discovery and implementation are fundamentally different cognitive tasks and should never share a context window.

Discovery is exploratory, iterative, and tool-heavy. The model needs to follow semantic chains, backtrack when a search leads nowhere, and build a mental map of the codebase through dozens — sometimes hundreds — of tool calls. It burns tokens on retrieval work. That's its job.

Implementation is structured, constrained, and output-heavy. The model needs a clear picture of reality and should spend its tokens on reasoning about the change, not on figuring out what files exist.

Mixing them in a single agent prompt — the standard approach — produces a system that explores when it should write code and writes code when it should still be exploring. Teddy eliminates this by making context generation a deliberate, separate step.

Without Teddy:                          With Teddy:
────────────────                        ──────────
Agent receives: "Add dark mode"         Teddy receives: "Add dark mode"
Agent searches... (burns tokens)        Teddy explores (200 tool calls)
Agent finds some files... (incomplete)  Teddy produces Context Package
Agent starts coding... (wrong patterns) Agent receives: Context Package
Agent hits a type error... (backtracks) Agent starts coding (right patterns)
Agent finds more files... (more tokens) Agent produces correct implementation
Agent rewrites... (wasted work)         ✓ First attempt

Semantic-First Discovery

Most codebase exploration tools search by text pattern — you need to already know the function name, the variable, the string. Teddy starts with semantic search via chunkhound, a vector embedding tool that understands intent.

Start with meaning, not keywords

The model's first move is always chunkhound: "Find code related to theme configuration and dark mode." This surfaces relevant code by semantic similarity, even when the codebase uses unexpected naming conventions. You don't need to know it's called useColorScheme to find it.

Narrow with exact patterns

Once chunkhound identifies the right area, ripgrep pins down exact usages: rg "ThemeProvider" --type ts finds every reference. fd locates related files by name: fd "theme" --extension ts. Broad-to-narrow, meaning-to-pattern.

Read with precision

readFile pulls the actual code with line numbers. The model reads the 40 lines that matter, not the entire 2,000-line file. tree maps the directory structure for spatial understanding. Each tool does one thing fast.

Synthesise into structure

After exploration, the model assembles everything into the Context Package: architecture diagrams, file tables with line numbers, type definitions, code snippets showing patterns to follow, constraints from config and tests, and explicit gotchas. The output IS the coding agent's entire world — it can't ask follow-up questions.

The Context Package

The output is a structured markdown document engineered for machine consumption:

# Context Package: Add Dark Mode to Settings

## Task Understanding
What the change requires, decomposed into sub-tasks.

## Architecture Overview
ASCII diagram of relevant component hierarchy and data flow.

## Files to Read
| Priority | File | Lines | Why |
|----------|------|-------|-----|
| Must | src/theme/provider.tsx | 1-45 | Theme context definition |
| Must | src/settings/page.tsx | 20-80 | Settings component structure |
| Should | src/hooks/useTheme.ts | all | Existing theme hook |

## Files to Create/Modify
Which files change and what changes in each.

## Patterns to Follow
Actual code snippets from the codebase showing conventions.

## Type Definitions
Relevant types and interfaces the implementation must satisfy.

## Dependencies & Imports
What to import and from where — matching existing conventions.

## Constraints & Requirements
What tests enforce. What config limits. What breaks if violated.

## Potential Gotchas
SSR hydration. localStorage sync. CSS variable naming.

## Implementation Hints
Concrete guidance for the coding agent.

The target is 25,000–30,000 tokens — enough to be comprehensive without exceeding downstream context budgets.

Linear Integration

Teddy plugs directly into Linear-driven workflows:

teddy --linear K20-935

This fetches the ticket's title and description as the task prompt, runs the full discovery, and posts the Context Package back as a comment on the ticket. The team sees the discovery on the issue. Coding agents consume it from the same source.

Bug Boomerang uses this exact flow for its discovery stage — a Linear bug ticket triggers teddy --linear, and the structured findings feed into Bug Boomerang's planning and coding pipeline.

Multi-Provider Support

Teddy auto-detects available LLM providers from environment variables:

Provider	Default Model	Detection
Fireworks	glm-5	`FIREWORKS_API_KEY` (checked first)
OpenAI	gpt-5.1-codex	`OPENAI_API_KEY`
Anthropic	claude-sonnet-4-5	`ANTHROPIC_API_KEY`
Cerebras	zai-glm-4.7	`CEREBRAS_API_KEY`

Override with --provider and --model flags, or TEDDY_PROVIDER / TEDDY_MODEL env vars. All providers use the same Vercel AI SDK interface — swap without code changes.

Streaming Mode

In streaming mode (-s), you watch Teddy work in real time:

├  🐕 sniffing...   chunkhound {"query": "theme configuration dark mode"}
├  🐕 digging...    ripgrep {"pattern": "ThemeProvider", "fileType": "ts"}
├  🐕 fetching...   readFile {"path": "src/theme/provider.tsx", "startLine": 1}
├  🐕 mapping...    tree {"path": "src/settings", "depth": 2}
├  🐕 sniffing...   fd {"pattern": "color", "extension": "css"}
...
✓  good boy — discovery.md written (28,412 tokens)

Every tool call is visible as it happens. You can see the exploration strategy unfold — the model deciding what to search, following chains, backtracking, narrowing. Not a black box.

What Makes This Different

Discovery ≠ Implementation

Two-phase decomposition. Teddy explores exhaustively so the coding agent can implement correctly. Different tasks, different agents, different token budgets.

Semantic-first, not keyword-first

Chunkhound finds code by meaning. You don't need to know the function name to find it. Broad intent → narrow pattern → precise reading.

Structured output, not conversation

The Context Package is a machine-consumable document with tables, code snippets, and explicit constraints — not a chat transcript the agent has to parse.

LLM-guided, not pipeline-driven

The model decides what to explore based on your specific task. No fixed analysis sequence. Semantic chains specific to the change you're making.

Tech Stack

Component	Technology
Language	TypeScript 5.7 (ESM)
Runtime	Node.js >= 20
CLI	Commander.js + @clack/prompts
LLM	Vercel AI SDK (tool-calling loop)
Providers	OpenAI, Anthropic, Fireworks, Cerebras
Semantic search	chunkhound (vector embeddings)
Pattern search	ripgrep
File finding	fd
Structure	tree
Validation	Zod
Linear	@linear/sdk
Build	tsup

Usage

# Install
npm install -g teddy-cli

# Basic discovery
teddy "Add dark mode to the settings page"

# From a Linear ticket (fetches + posts result)
teddy --linear K20-935

# Stream mode (watch exploration live)
teddy -s "Refactor auth to use JWT"

# Custom provider and output
teddy -p anthropic -m claude-opus-4-6 -o context.md "Add pagination to user list"

# Check dependencies
teddy --check

How It Connects

Bug Boomerang — uses Teddy as its discovery stage. Linear bug ticket → Teddy explores → findings feed into Bug Boomerang's planning and coding pipeline.
Xena — delegates codebase discovery to Teddy when Mark requests analysis via Linear.
The Context Package format is designed for any downstream coding agent — Claude Code, Pi, Codex, or custom pipelines.

Teddy