Building Agents That Actually Ship

Event-sourced context management for AI coding agents. Instead of growing conversation history, Warp Engine records every action as an event and assembles fresh, deterministic context for every model call.

What is Warp Engine?

Every AI coding tool has the same hidden problem: memory is a lie.

When you ask an AI to "fix the auth bug," it reads files, makes changes, runs tests. Then you say "now add logging." The model receives the entire conversation so far — your first message, its response, file contents from ten minutes ago, tool outputs from three iterations back — and tries to work with it.

By turn five, 20,000 tokens of accumulated history. By turn ten, 50,000. Most of it stale. The model pays attention to all of it equally, because it has no way to know what's current and what's garbage.

Warp Engine takes a completely different approach. It doesn't keep conversation history. It keeps an event log. Every action — every file read, edit, command, test, decision — is recorded as a structured event in an append-only SQLite database. When the model needs to act, Warp Engine assembles fresh context from the event log. Right now. At this moment. The conversation history? Thrown away. Every turn.

Events are memory. Context is computed.

The model gets a clean, deterministic picture of reality computed from verified facts — not a growing narrative it has to parse and hope it interprets correctly. Constant context size. Perfect auditability. Every turn grounded in what's actually true right now.

How It Works

Traditional AI Coding vs. Warp Engine

TRADITIONAL (growing history):              WARP ENGINE (computed context):
────────────────────────────                ────────────────────────────
Turn 1:  200 tokens                         Turn 1:  ~8K tokens (assembled)
Turn 2:  600 tokens                         Turn 2:  ~8K tokens (assembled)
Turn 5:  5,000 tokens                       Turn 5:  ~8K tokens (assembled)
Turn 10: 20,000 tokens                      Turn 10: ~8K tokens (assembled)
Turn 20: 50,000+ tokens (stale, noisy)      Turn 20: ~8K tokens (fresh, verified)

Every turn, the model receives a clean projection — which files are active, what changes were made, what's broken, what decisions were taken — rehydrated from disk with SHA256 verification. Not "what we talked about." What's actually true.

Event Types

Eleven structured event types cover every action:

Event	What it captures
SessionStarted	Working directory, repo root, active tools
UserIntent	Normalised user request
ToolCalled	Tool name, typed input (before execution)
ToolResult	Status, bounded output (after execution)
FileRead	Path, line range, SHA256 hash of content slice
FileChanged	Path, full unified diff, summary (+N/-M lines)
CommandRun	Command, exit code, output excerpt
TestStatus	Pass/fail, error excerpts
Decision	Strategic choice with rationale (supports supersession)
RepoMap	Git file inventory (HEAD, file count, top dirs/extensions)
Custom	Extension-defined events

All events are append-only, bounded to 1MB max payload, and stored in SQLite with WAL mode for concurrent reads/writes.

The Stale Context Guard

This prevents the most common class of AI coding errors: editing a file based on what the model thinks is there rather than what's actually there.

Has the file been read?

If there's no FileRead event for this file, the edit is blocked. The model must read first.

Has the file changed since it was read?

The system compares the file's current mtime and size against the stats recorded at read time. If they differ, the edit is blocked.

Is the read too old?

Default: 2 hours. If the model read the file 3 hours ago and hasn't re-read it, the edit is blocked.

Blocked with rationale

If any check fails, the tool call is rejected: "File changed since last read. Re-read before editing." The block is logged as an event and surfaced in the projection.

Every edit is grounded in a verified, hash-checked read. No hallucinated line numbers. No patches applied to code that moved.

Failure Fingerprinting

Most agents treat test failures as a blob of text. Warp Engine treats them as structured, trackable entities.

When pytest tests/test_auth.py fails, Warp Engine creates a fingerprint from the command identity and a hash of the error output:

bash:pytest tests/test_auth.py:deadbeef  →  AssertionError (attempt 1)
bash:pytest tests/test_auth.py:cafebabe  →  ImportError (attempt 1)

Different errors get different fingerprints. If the model fixes the AssertionError but introduces an ImportError, both are tracked independently. The attempt counter increments on each failure.

When the command succeeds (exit 0), every fingerprint for that command is resolved at once. The model sees them in "recently resolved" and knows it just fixed two distinct issues.

The model gets something it's never had before: a precise understanding of what's broken, how many times it's tried, and whether it's making progress. Structured state derived from events, not vibes from reading old conversation history.

Tool Policy Enforcement

Warp Engine doesn't just observe — it enforces discipline. In enforce mode, violations block tool calls. In report mode, they're logged but allowed.

Policy	Rule	Why
Discovery order	Must run exact search (grep/rg/ast-grep) before semantic search (morph_grep)	Prevents expensive fuzzy searches when keyword match suffices
Edit order	Must try patch-based edit before fast-apply (morph_edit)	Fast-apply is fallback, not default
Read before write	Every edit requires a verified prior read	Prevents edits on unseen files
Read size limits	Max 500 lines per read (configurable)	Prevents context flooding
Bounded outputs	Tool results truncated to 2K lines / 50K chars	Model gets what it needs without drowning in noise

All policy actions are themselves logged as events, creating an explainable audit trail for every blocked action.

The Projection: What the Model Actually Sees

Instead of 50 messages of conversation history, the model receives:

[WARP ENGINE CONTEXT]

Repository: kahunas2 (abc123, 247 files)

Intent: "Add logging to the auth module"

Working Set (3 files):
  src/auth.py (read 2m ago, changed 1m ago)
    Diff: +3/-1 lines (added bcrypt import + hash call)
    Lines 45-67:
      def hash_password(pwd: str) -> str:
          return bcrypt.hashpw(pwd.encode(), bcrypt.gensalt())

  tests/test_auth.py (read 4m ago)
    Lines 12-30: [fresh excerpt from disk]

  src/config.py (read 6m ago, STALE — changed on disk)

Open Failures (1):
  pytest tests/test_auth.py — exit 1 (attempt 2)
    AssertionError: Expected hashed output, got plaintext

Decisions:
  - Use bcrypt for password hashing (industry standard)

Recently Resolved:
  - pytest tests/test_auth.py:cafebabe — ImportError (resolved)

Clean. Current. Grounded in verified facts. The working set contains fresh code excerpts rehydrated from disk — the event log stores metadata and a SHA256 hash, the assembler reads the actual file content at assembly time and verifies it matches. If the hash doesn't match, the file is marked stale. The event log stays small. The assembled context contains real code.

Token Budgeting

The context assembler operates within a strict token budget computed from model metadata:

Context window (e.g., 200K)
  − reserved output tokens (min(8192, maxTokens × 0.8))
  − reserved system/tool schemas (~12K)
  = available budget (~180K)
    → 60% allocated to dynamic content

Sections are packed in priority order with dedicated budget shares:

Section	Budget share	Priority
Header (repo, intent)	Always included	Highest
Open failures	35% of remaining	Forced — always present
Decisions	20% of remaining	High
Working set	75% of remaining	High
Resolved / progress / latency	Remainder	Normal
Footer	Always included	Highest

Critical items (failures, header, footer) always appear. The greedy packing algorithm is fast and deterministic — not optimal knapsack, but predictable and provider-agnostic.

Snapshots and Retention

Snapshots for fast startup

Every 500 events, the full projection state is serialised to disk. On startup, Warp Engine loads the latest snapshot and replays only new events — no need to reprocess thousands of historical events.

Automatic pruning

On session shutdown, events older than 7 days are pruned (keeping a minimum of 2,000 events). The event log stays manageable without manual intervention.

Manual controls

/warp-prune-events supports dry-run, archive to JSONL (optionally gzipped), vacuum, and configurable minimum event floors. Archive before you prune — the full history is preserved if you need it.

Commands

Command	Purpose
`/warp`	Show working set, open failures, decisions, DB path
`/warp-dump`	Print full projection state as JSON
`/warp-context`	Show assembled context for the last user message
`/warp-snapshot`	Force write a snapshot
`/warp-prune`	Prune snapshot cache (keep N newest)
`/warp-prune-events`	Prune old events (days, dry-run, archive, vacuum, min-events)
`/warp-repo`	Rebuild git-backed repo map

What Makes This Different

Constant context size

Turn 3 and turn 30 cost the same. Context is computed fresh, not accumulated. No growing token bills.

Verified state, not remembered state

SHA256 hashes prove the model's knowledge is current. Stale edits are blocked before they happen.

Perfect auditability

Every action is in a queryable SQLite log. Deterministic replay from any point. Full decision trail.

Enforced discipline

Read before write. Search before semantic search. Bounded outputs. The model follows engineering discipline, not suggestions.

Architecture

Warp Engine is built on three properties that reinforce each other:

Causality — what changed is auditable. Every event is immutable, timestamped, and traceable.
Boundedness — context remains constrained per turn. Token budgets are strict. Payloads are bounded.
Recoverability — state can be replayed from the log. Snapshots accelerate startup. Pruning controls growth.

The separation between raw interactions (the event log) and model-visible context (the assembled projection) is architecturally load-bearing. The log is the source of truth. The projection is a derived view. The context is a rendered document. Each layer has a single responsibility and can be evolved independently.

Integration with Pi

Warp Engine hooks into pi's extension lifecycle at six points:

Hook	What Warp Engine does
`session_start`	Initialise session, build repo map from `git ls-files`
`before_agent_start`	Inject hidden control message with budget metadata
`tool_call`	Policy enforcement (discovery order, stale guard, read limits). Capture pre-images for diff computation
`tool_result`	Record events (ToolResult, FileRead, FileChanged, CommandRun). Bound tool outputs
`context`	Replace conversation history with assembled warp context + current turn tail
`session_shutdown`	Write snapshot, prune old events

The host agent remains agnostic. Enabling or disabling Warp Engine changes behaviour without touching core agent code.

Tech Stack

Component	Technology
Language	TypeScript (~1,600 lines across 4 core modules)
Storage	SQLite (`node:sqlite`) with WAL mode
Hashing	SHA256 for content verification
Diffing	Unix `diff -u` for change tracking
Token estimation	Deterministic heuristic (bytes / 3.2) — no provider dependency
Integration	Pi extension hooks (lifecycle events)
Config	`.pi/warp-engine/config.json`

Configuration

{
  "enabled": true,
  "policyMode": "enforce",
  "maxReadLines": 500,
  "requireReadBeforeWrite": true,
  "maxReadAgeMs": 7200000,
  "enforceDiscoveryOrder": true,
  "enforceEditOrder": true,
  "minEventsToKeep": 2000
}

Set policyMode to "report" to log violations without blocking. Set "enabled": false to disable per-project.

The Core Insight

Chat history is an implementation detail, not the memory. Every AI coding tool treats the growing conversation buffer as the model's memory because it's the cheapest approximation of one. But it's lossy, grows without bound, goes stale, and can't be queried, replayed, or verified.

Warp Engine replaces the approximation with the real thing: an append-only event log that captures what actually happened, a deterministic projector that computes what's true right now, and a context assembler that renders exactly what the model needs to see — fresh, bounded, and grounded in hash-verified facts.

The conversation was never the memory. The events are.

Warp Engine

Constant context size

Verified state, not remembered state

Perfect auditability

Enforced discipline

On this page