Building Agents That Actually Ship

A transparent validation gate that checks every agent file write against project-specific rules before it hits disk — AST patterns, regex, and a local LLM judge on Apple Silicon.

What is Code Supervisor?

AI coding agents write code fast. They also write code that violates your project's architectural rules — importing from the wrong layer, calling fetch() in a component, leaking secrets into source files. The review catches it, the agent rewrites, you review again. Expensive in time and tokens.

Code Supervisor is a transparent validation gate that sits between the agent and the filesystem. Every write_file and edit_file call gets checked against your project's rules before it hits disk. If the code violates a rule, the agent gets a normal tool error and self-corrects. It never knows the supervisor exists.

Zero agent awareness

The agent sees a standard tool error — "File write rejected: Components must not call fetch() directly." It treats this like any other tool failure and fixes the code. No special prompting, no system prompt changes, no custom tools. The supervisor is invisible.

Why This Exists

The review-fix-review loop is the most expensive part of agent-assisted development. Not because the agent can't fix things — it can. But every round-trip burns context window, burns tokens, and breaks flow. The rules being violated are usually structural — things that could have been caught before the write ever happened.

Code Supervisor moves enforcement upstream. Instead of catching violations after the fact, it catches them at the point of writing. The agent self-corrects in real-time, and the code that lands on disk is already compliant.

How It Works

The middleware intercepts write_file and edit_file tool calls via awrap_tool_call — the same middleware protocol used by the background task system and Morph integration. It composes cleanly with the existing stack.

For write_file, it validates the content directly. For edit_file, it reads the current file, applies the old_string → new_string replacement, and validates the result.

The Three Tiers

Code Supervisor runs rules through three validation tiers, each progressively more expensive. Tier 3 only fires when the first two pass — no point running an LLM if a regex already caught the problem.

Tier 1: ast-grep (50–200ms)

AST-level pattern matching via tree-sitter. This is the primary tier. It matches actual code structure — not text — so it won't false-positive on comments, strings, or variable names that happen to contain fetch.

# .supervisor/rules/no-fetch-in-components.yml
id: no-fetch-in-components
language: tsx
severity: error
message: "Components must not call fetch() directly. Use a handler."
rule:
  pattern: fetch($$$)
files:
  - "packages/ui/**/*.tsx"

The files field determines which paths the rule applies to. The supervisor handles path matching itself — ast-grep only sees the content via --inline-rules against a temp file.

Tier 2: Pattern + Import Rules (< 10ms)

Fast regex matching for things that don't need AST precision.

Pattern rules run re.search() against each line — good for secrets, hardcoded URLs, banned strings:

pattern_rules:
  - id: no-secrets-in-code
    paths: ["**/*.ts", "**/*.py"]
    deny_patterns:
      - "AKIA[0-9A-Z]{16}"
      - "sk-[a-zA-Z0-9]{48}"
    severity: error
    message: "Possible secret detected. Use environment variables."

Import rules extract import/require statements and check against a deny list. Denying @myapp/handlers also blocks @myapp/handlers/user:

import_rules:
  - id: ui-no-handler-imports
    paths: ["packages/ui/**/*.tsx"]
    deny_imports:
      - "@myapp/handlers"
      - "../../handlers"
    severity: error
    message: "UI components must not import from handlers."

Tier 3: LLM Judge (1–3s)

For semantic rules that can't be expressed structurally. Runs a local model via MLX on Apple Silicon — no API calls, no network, no data leaving your machine.

semantic_rules:
  - id: schema-render-only
    paths: ["packages/schemas/**/*.ts"]
    description: "Schema files must not contain React code, hooks, or JSX. They define data structures only."
    severity: error

The model (Qwen3-4B-4bit by default) is loaded lazily on first use and cached for the process lifetime. Inference runs at ~2s on M3 Ultra. The supervisor feeds a tight prompt with the rules and file content, gets back JSON verdicts, and converts violations.

Short-circuit design

Tier 3 only runs when semantic rules match the file path and Tiers 1–2 found no error-severity violations. If a regex already caught a secret, there's no reason to burn 2 seconds on LLM inference.

Setup

Create a .supervisor/ directory at your project root:

your-project/
  .supervisor/
    config.yml              # Main config + pattern/import/semantic rules
    rules/                  # ast-grep rule files (one per rule)
      no-fetch-in-components.yml
      schemas-purity.yml

config.yml

version: 1
mode: enforce                              # enforce | warn | off
llm_model: mlx-community/Qwen3-4B-4bit    # Model for semantic rules

pattern_rules:
  - id: no-secrets-in-code
    paths: ["**/*.ts", "**/*.py"]
    deny_patterns:
      - "AKIA[0-9A-Z]{16}"
    severity: error
    message: "Possible secret detected."

import_rules:
  - id: ui-no-handler-imports
    paths: ["packages/ui/**/*.tsx"]
    deny_imports:
      - "@myapp/handlers"
    severity: error
    message: "UI must not import from handlers."

semantic_rules:
  - id: schema-render-only
    paths: ["packages/schemas/**/*.ts"]
    description: "Schemas define rendering structure only. No React, no hooks, no JSX."
    severity: error

When .supervisor/ exists in the project root, the FileSupervisorMiddleware is automatically added to the Deep agent's middleware stack. No configuration needed beyond dropping the directory in.

What the Agent Sees

On rejection, the agent receives a standard tool error:

Error: File write rejected by project rules.

Violations in packages/ui/components/UserProfile.tsx:

  ✗ [no-fetch-in-components] Components must not call fetch() directly.
    Line 42
    → Use a handler from lib/handlers/ instead.

  ✗ [ui-no-handler-imports] UI must not import from handlers.
    Line 3

Fix these violations and try again.

The agent treats this identically to a file permission error or a syntax failure. It reads the violation messages, rewrites the code, and tries again. The corrected version passes, and the file lands on disk — compliant from the start.

The Standalone Package

Code Supervisor is a standalone Python package (code-supervisor) with its own CLI and Python API. It doesn't depend on Deep or deepagents — you can use it in any pipeline.

# Check a file against project rules
code-supervisor check src/ui/Button.tsx --config .supervisor

# Check proposed content from stdin
echo "content" | code-supervisor check src/ui/Button.tsx --stdin --config .supervisor

# JSON output for programmatic use
code-supervisor check src/ui/Button.tsx --json --config .supervisor

# Validate all rules are well-formed
code-supervisor lint-rules --config .supervisor

from code_supervisor import validate, avalidate, load_config

# Load rules once (cached, reloads on file change)
config = load_config("/path/to/project")

# Sync validation
result = validate("src/ui/Button.tsx", content, config, "/path/to/project")
if not result.passed:
    print(result.format_rejection())

# Async (runs validation in thread pool)
result = await avalidate("src/ui/Button.tsx", content, config, "/path/to/project")

Performance

The whole point is speed. If the supervisor adds perceptible latency, it's broken.

Scenario	Latency
No `.supervisor/` directory	0ms — not loaded
No rules match file path	< 0.1ms
Pattern + import rules only	< 10ms
ast-grep structural rules	50–200ms
Semantic rules (LLM judge)	1–3s
Worst case, all tiers	< 3.5s

Files that don't match any rule paths pass through with zero overhead. The middleware checks for the .supervisor/ directory at startup — if it doesn't exist, the middleware isn't even loaded.

Deep Integration

The integration into Deep is a thin adapter — one middleware class and one conditional in agent.py:

# agent.py — middleware assembly
if sandbox is None and settings.project_root:
    supervisor_dir = settings.project_root / ".supervisor"
    if supervisor_dir.is_dir():
        from deepagents_cli.supervisor_middleware import FileSupervisorMiddleware
        agent_middleware.append(
            FileSupervisorMiddleware(project_root=settings.project_root)
        )

The middleware intercepts tool calls via the same awrap_tool_call protocol used by the background task system and Morph filesystem middleware. It composes with the existing stack without inheritance conflicts.

Middleware as enforcement

This is the same pattern as the background task system — intercept via awrap_tool_call, make a decision, either pass through or short-circuit with a ToolMessage. The deepagents middleware system is flexible enough to support transparent enforcement without any upstream changes.

Config Reference

Field	Default	Description
`version`	`1`	Config format version
`mode`	`enforce`	`enforce` blocks writes, `warn` logs only, `off` disables
`timeout_ms`	`3000`	Max time per validation
`ast_grep_bin`	auto-detected	Path to ast-grep binary
`llm_model`	`mlx-community/Qwen3-4B-4bit`	MLX model for semantic rules
`pattern_rules`	`[]`	Regex deny-pattern rules
`import_rules`	`[]`	Import restriction rules
`semantic_rules`	`[]`	LLM-evaluated semantic rules

What Makes This Different

Transparent to the agent

No system prompt changes, no custom tools. The agent sees a normal tool error and self-corrects. It doesn't know the supervisor exists.

Three-tier validation

AST patterns for structure, regex for secrets and imports, local LLM for semantics. Each tier is fast and only fires when relevant.

Local LLM, no network

Semantic rules run on-device via MLX on Apple Silicon. No API calls, no data leaving your machine, ~2s inference.

Standalone package

Works outside of Deep as a CLI tool and Python library. Drop a .supervisor/ directory into any project and run code-supervisor check.

Code Supervisor

Transparent to the agent

Three-tier validation

Local LLM, no network

Standalone package

On this page