Warp Engine
Event-sourced context management for AI coding agents. Instead of growing conversation history, Warp Engine records every action as an event and assembles fresh, deterministic context for every model call.
What is Warp Engine?
Every AI coding tool has the same hidden problem: memory is a lie.
When you ask an AI to "fix the auth bug," it reads files, makes changes, runs tests. Then you say "now add logging." The model receives the entire conversation so far — your first message, its response, file contents from ten minutes ago, tool outputs from three iterations back — and tries to work with it.
By turn five, 20,000 tokens of accumulated history. By turn ten, 50,000. Most of it stale. The model pays attention to all of it equally, because it has no way to know what's current and what's garbage.
Warp Engine takes a completely different approach. It doesn't keep conversation history. It keeps an event log. Every action — every file read, edit, command, test, decision — is recorded as a structured event in an append-only SQLite database. When the model needs to act, Warp Engine assembles fresh context from the event log. Right now. At this moment. The conversation history? Thrown away. Every turn.
Events are memory. Context is computed.
The model gets a clean, deterministic picture of reality computed from verified facts — not a growing narrative it has to parse and hope it interprets correctly. Constant context size. Perfect auditability. Every turn grounded in what's actually true right now.
How It Works
Traditional AI Coding vs. Warp Engine
TRADITIONAL (growing history): WARP ENGINE (computed context):
──────────────────────────── ────────────────────────────
Turn 1: 200 tokens Turn 1: ~8K tokens (assembled)
Turn 2: 600 tokens Turn 2: ~8K tokens (assembled)
Turn 5: 5,000 tokens Turn 5: ~8K tokens (assembled)
Turn 10: 20,000 tokens Turn 10: ~8K tokens (assembled)
Turn 20: 50,000+ tokens (stale, noisy) Turn 20: ~8K tokens (fresh, verified)Every turn, the model receives a clean projection — which files are active, what changes were made, what's broken, what decisions were taken — rehydrated from disk with SHA256 verification. Not "what we talked about." What's actually true.
Event Types
Eleven structured event types cover every action:
| Event | What it captures |
|---|---|
| SessionStarted | Working directory, repo root, active tools |
| UserIntent | Normalised user request |
| ToolCalled | Tool name, typed input (before execution) |
| ToolResult | Status, bounded output (after execution) |
| FileRead | Path, line range, SHA256 hash of content slice |
| FileChanged | Path, full unified diff, summary (+N/-M lines) |
| CommandRun | Command, exit code, output excerpt |
| TestStatus | Pass/fail, error excerpts |
| Decision | Strategic choice with rationale (supports supersession) |
| RepoMap | Git file inventory (HEAD, file count, top dirs/extensions) |
| Custom | Extension-defined events |
All events are append-only, bounded to 1MB max payload, and stored in SQLite with WAL mode for concurrent reads/writes.
The Stale Context Guard
This prevents the most common class of AI coding errors: editing a file based on what the model thinks is there rather than what's actually there.
Has the file been read?
If there's no FileRead event for this file, the edit is blocked. The model must read first.
Has the file changed since it was read?
The system compares the file's current mtime and size against the stats recorded at read time. If they differ, the edit is blocked.
Is the read too old?
Default: 2 hours. If the model read the file 3 hours ago and hasn't re-read it, the edit is blocked.
Blocked with rationale
If any check fails, the tool call is rejected: "File changed since last read. Re-read before editing." The block is logged as an event and surfaced in the projection.
Every edit is grounded in a verified, hash-checked read. No hallucinated line numbers. No patches applied to code that moved.
Failure Fingerprinting
Most agents treat test failures as a blob of text. Warp Engine treats them as structured, trackable entities.
When pytest tests/test_auth.py fails, Warp Engine creates a fingerprint from the command identity and a hash of the error output:
bash:pytest tests/test_auth.py:deadbeef → AssertionError (attempt 1)
bash:pytest tests/test_auth.py:cafebabe → ImportError (attempt 1)Different errors get different fingerprints. If the model fixes the AssertionError but introduces an ImportError, both are tracked independently. The attempt counter increments on each failure.
When the command succeeds (exit 0), every fingerprint for that command is resolved at once. The model sees them in "recently resolved" and knows it just fixed two distinct issues.
The model gets something it's never had before: a precise understanding of what's broken, how many times it's tried, and whether it's making progress. Structured state derived from events, not vibes from reading old conversation history.
Tool Policy Enforcement
Warp Engine doesn't just observe — it enforces discipline. In enforce mode, violations block tool calls. In report mode, they're logged but allowed.
| Policy | Rule | Why |
|---|---|---|
| Discovery order | Must run exact search (grep/rg/ast-grep) before semantic search (morph_grep) | Prevents expensive fuzzy searches when keyword match suffices |
| Edit order | Must try patch-based edit before fast-apply (morph_edit) | Fast-apply is fallback, not default |
| Read before write | Every edit requires a verified prior read | Prevents edits on unseen files |
| Read size limits | Max 500 lines per read (configurable) | Prevents context flooding |
| Bounded outputs | Tool results truncated to 2K lines / 50K chars | Model gets what it needs without drowning in noise |
All policy actions are themselves logged as events, creating an explainable audit trail for every blocked action.
The Projection: What the Model Actually Sees
Instead of 50 messages of conversation history, the model receives:
[WARP ENGINE CONTEXT]
Repository: kahunas2 (abc123, 247 files)
Intent: "Add logging to the auth module"
Working Set (3 files):
src/auth.py (read 2m ago, changed 1m ago)
Diff: +3/-1 lines (added bcrypt import + hash call)
Lines 45-67:
def hash_password(pwd: str) -> str:
return bcrypt.hashpw(pwd.encode(), bcrypt.gensalt())
tests/test_auth.py (read 4m ago)
Lines 12-30: [fresh excerpt from disk]
src/config.py (read 6m ago, STALE — changed on disk)
Open Failures (1):
pytest tests/test_auth.py — exit 1 (attempt 2)
AssertionError: Expected hashed output, got plaintext
Decisions:
- Use bcrypt for password hashing (industry standard)
Recently Resolved:
- pytest tests/test_auth.py:cafebabe — ImportError (resolved)Clean. Current. Grounded in verified facts. The working set contains fresh code excerpts rehydrated from disk — the event log stores metadata and a SHA256 hash, the assembler reads the actual file content at assembly time and verifies it matches. If the hash doesn't match, the file is marked stale. The event log stays small. The assembled context contains real code.
Token Budgeting
The context assembler operates within a strict token budget computed from model metadata:
Context window (e.g., 200K)
− reserved output tokens (min(8192, maxTokens × 0.8))
− reserved system/tool schemas (~12K)
= available budget (~180K)
→ 60% allocated to dynamic contentSections are packed in priority order with dedicated budget shares:
| Section | Budget share | Priority |
|---|---|---|
| Header (repo, intent) | Always included | Highest |
| Open failures | 35% of remaining | Forced — always present |
| Decisions | 20% of remaining | High |
| Working set | 75% of remaining | High |
| Resolved / progress / latency | Remainder | Normal |
| Footer | Always included | Highest |
Critical items (failures, header, footer) always appear. The greedy packing algorithm is fast and deterministic — not optimal knapsack, but predictable and provider-agnostic.
Snapshots and Retention
Snapshots for fast startup
Every 500 events, the full projection state is serialised to disk. On startup, Warp Engine loads the latest snapshot and replays only new events — no need to reprocess thousands of historical events.
Automatic pruning
On session shutdown, events older than 7 days are pruned (keeping a minimum of 2,000 events). The event log stays manageable without manual intervention.
Manual controls
/warp-prune-events supports dry-run, archive to JSONL (optionally gzipped), vacuum, and configurable minimum event floors. Archive before you prune — the full history is preserved if you need it.
Commands
| Command | Purpose |
|---|---|
/warp | Show working set, open failures, decisions, DB path |
/warp-dump | Print full projection state as JSON |
/warp-context | Show assembled context for the last user message |
/warp-snapshot | Force write a snapshot |
/warp-prune | Prune snapshot cache (keep N newest) |
/warp-prune-events | Prune old events (days, dry-run, archive, vacuum, min-events) |
/warp-repo | Rebuild git-backed repo map |
What Makes This Different
Constant context size
Turn 3 and turn 30 cost the same. Context is computed fresh, not accumulated. No growing token bills.
Verified state, not remembered state
SHA256 hashes prove the model's knowledge is current. Stale edits are blocked before they happen.
Perfect auditability
Every action is in a queryable SQLite log. Deterministic replay from any point. Full decision trail.
Enforced discipline
Read before write. Search before semantic search. Bounded outputs. The model follows engineering discipline, not suggestions.
Architecture
Warp Engine is built on three properties that reinforce each other:
- Causality — what changed is auditable. Every event is immutable, timestamped, and traceable.
- Boundedness — context remains constrained per turn. Token budgets are strict. Payloads are bounded.
- Recoverability — state can be replayed from the log. Snapshots accelerate startup. Pruning controls growth.
The separation between raw interactions (the event log) and model-visible context (the assembled projection) is architecturally load-bearing. The log is the source of truth. The projection is a derived view. The context is a rendered document. Each layer has a single responsibility and can be evolved independently.
Integration with Pi
Warp Engine hooks into pi's extension lifecycle at six points:
| Hook | What Warp Engine does |
|---|---|
session_start | Initialise session, build repo map from git ls-files |
before_agent_start | Inject hidden control message with budget metadata |
tool_call | Policy enforcement (discovery order, stale guard, read limits). Capture pre-images for diff computation |
tool_result | Record events (ToolResult, FileRead, FileChanged, CommandRun). Bound tool outputs |
context | Replace conversation history with assembled warp context + current turn tail |
session_shutdown | Write snapshot, prune old events |
The host agent remains agnostic. Enabling or disabling Warp Engine changes behaviour without touching core agent code.
Tech Stack
| Component | Technology |
|---|---|
| Language | TypeScript (~1,600 lines across 4 core modules) |
| Storage | SQLite (node:sqlite) with WAL mode |
| Hashing | SHA256 for content verification |
| Diffing | Unix diff -u for change tracking |
| Token estimation | Deterministic heuristic (bytes / 3.2) — no provider dependency |
| Integration | Pi extension hooks (lifecycle events) |
| Config | .pi/warp-engine/config.json |
Configuration
{
"enabled": true,
"policyMode": "enforce",
"maxReadLines": 500,
"requireReadBeforeWrite": true,
"maxReadAgeMs": 7200000,
"enforceDiscoveryOrder": true,
"enforceEditOrder": true,
"minEventsToKeep": 2000
}Set policyMode to "report" to log violations without blocking. Set "enabled": false to disable per-project.
The Core Insight
Chat history is an implementation detail, not the memory. Every AI coding tool treats the growing conversation buffer as the model's memory because it's the cheapest approximation of one. But it's lossy, grows without bound, goes stale, and can't be queried, replayed, or verified.
Warp Engine replaces the approximation with the real thing: an append-only event log that captures what actually happened, a deterministic projector that computes what's true right now, and a context assembler that renders exactly what the model needs to see — fresh, bounded, and grounded in hash-verified facts.
The conversation was never the memory. The events are.
Noktua
A local-first Mac desktop agent that orchestrates a team of subagents — one conversation, everything handled. Like Anthropic's Cowork, but you own it.
Teddy
A pre-coding discovery agent that explores codebases like a dog digging for bones — relentlessly, systematically, and with zero quit. Produces structured context packages that give downstream coding agents first-attempt accuracy.