Building Agents That Actually Ship

A local-first Mac desktop agent that orchestrates a team of subagents — one conversation, everything handled. Like Anthropic's Cowork, but you own it.

What is Noktua?

Noktua is your AI chief of staff as a Mac app. One conversation. One interface. You tell it what you want, it figures out what to do — delegates to specialised agents, controls your desktop, browses the web, manages your files — and reports back.

It's built on top of openwork (the Electron desktop agent framework) and deepagents (LangChain's deep agent runtime) — then heavily extended with Xena-style orchestration patterns: YAML registries, subagent spawning, browser automation, middleware resilience, and the same tool routing architecture.

The pitch

People don't want 15 AI tools. They want one that does everything. Not a chatbot that answers questions — a teammate that takes action. Open Noktua, say what you need, it gets done.

Why does this exist?

Every AI desktop app is basically a chat wrapper around one model. You talk, it replies, you're still doing all the work. The model can't actually do things on your computer, can't run tasks in the background, can't follow up on its own, can't coordinate multiple jobs at once.

Noktua is different because:

You talk to one orchestrator — Noktua itself. It decides which agents to delegate to, what tools to use, and when to come back to you.
Subagents run in the background — up to 4 concurrent agents, each streaming in their own tab. You can watch or ignore them.
It controls your Mac — screenshots, clipboard, file system, app control. No OAuth dance required.
It browses the web for you — real browser automation via Browser Use, streamed live so you can see what it's doing.
It's local-first — your data lives in ~/.openwork/. Your API keys. Your machine. No cloud backend.
Any model, any provider — Anthropic, OpenAI, Google, Cerebras, Ollama. Swap models mid-conversation.

How It Works

The Conversation Flow

You say what you need

"Reply to Sarah's email, create a Linear issue for the proposal, and research competitor pricing." One message. Noktua figures out the rest.

Noktua orchestrates

The main LangGraph runtime decomposes your request. It decides what needs subagents (long-running, independent work) versus what it can handle inline (quick file operations, simple answers). It spawns background agents via tool_execute with registry paths.

Subagents stream in tabs

Each background agent gets its own tab at the top of your chat. Click to watch it work in real-time, or ignore it. Browser automation streams live — you literally see the browser navigating.

HITL keeps you safe

Before any shell command runs, Noktua asks permission. You see exactly what it wants to execute and can approve, reject, or edit. In yolo mode? Everything auto-approves. Your call.

Results flow back

When subagents complete, their output flows back to the orchestrator. Noktua synthesises everything into a single response. Subagent cards persist in the sidebar — click to revisit any past agent run.

The Middleware Stack

Noktua's runtime isn't just "call an LLM and hope." It's a carefully ordered middleware stack that handles everything between your message and the model's response:

createAgentRuntime(threadId, modelId, workspacePath)
│
├─ 1. Model fallback         → same-provider first, then cross-provider
├─ 2. Resilience             → additive backoff (20s→30s→40s), 6 retries / 5 min
├─ 3. Todo list              → persistent task tracking across turns
├─ 4. Filesystem backend     → ls, read, write, edit, glob, grep (LocalSandbox)
├─ 5. Summarization          → model-aware token/message triggers
├─ 6. Prompt caching         → Anthropic cache headers
├─ 7. Tool arg normalizer    → clean up model output quirks
├─ 8. Tool allowlist         → only registered tools pass through
└─ 9. Human-in-the-loop      → interrupt gate on execute

This order matters. Fallback wraps resilience. Resilience wraps filesystem. Summarization compresses before you hit context limits. HITL is the final gate before anything touches your system.

When context overflows, the runtime doesn't crash — it compacts messages and retries once before surfacing an error. When a provider rate-limits you, it backs off with additive delays and a bounded retry budget. When one model fails entirely, it falls through to alternatives automatically.

The Registry System

Everything in Noktua is defined in YAML files, not hardcoded. Three registries compose to create the full capability surface:

~/.openwork/tools.yaml — What Noktua can do

version: 2
tools:
  - path: agent.task.spawn
    description: Spawn a background subagent
    required: [name, description, agent_id]
  - path: browser.task.run
    description: Run a browser automation task
    required: [task]

Tools are the capability surface. The model sees one interface: tool_execute(path, payload). The registry resolves the rest. New tools are auto-merged on startup without overwriting user customisations.

~/.openwork/skills.yaml — How Noktua should behave

Skills are system prompts for specialised agents. They encode judgment, not just API instructions. "When handling email, lead with the answer and match the sender's tone." "When researching, cite sources and flag uncertainty." Skills get injected into agent system prompts at spawn time.

~/.openwork/agents.yaml — Who does the work

Agents compose a model + tools + skill into a deployable unit:

agents:
  - id: email-handler
    name: Email Agent
    model: anthropic:claude-sonnet-4-5-20250929
    skill: communication
    tools: [communication.email.reply, communication.email.send]
  - id: researcher
    name: Research Agent
    model: openai:gpt-4o
    skill: research
    tools: [browser.task.run]

Creating a new agent is a conversation: "I need an agent that handles my email every morning." Noktua writes the YAML.

This is the macro.micro.atomic composition pattern: agents (macro) are composed of skills (micro) and tools (atomic). Everything is a file. Everything is swappable.

Progressive Disclosure

Noktua is designed in layers. You only go as deep as you want:

Layer 0 — Chat

Just talk. Ask questions, give instructions, get results. This is all most people ever need.

Layer 1 — Sidebar

See your threads, active agents, files, and tasks. Click into any subagent to watch it work.

Layer 2 — Tabs & Artifacts

Subagent streams, file previews, code artifacts, browser automation sessions. Full visibility into what's happening.

Layer 3 — YAML Registries

Edit tools.yaml, skills.yaml, agents.yaml directly. Create custom agents, define new tools, tune behaviour. Power user territory.

Multi-Provider Intelligence

Noktua isn't locked to one AI provider. Models are fetched dynamically from provider APIs on launch:

Provider	Models	Notes
Anthropic	Claude Opus 4.5, Sonnet 4.5, Haiku 4.5	Prompt caching middleware enabled
OpenAI	GPT-5.x, o3, o4 Mini, GPT-4.1	Function calling + streaming
Google	Gemini 3 Pro/Flash, 2.5 Pro/Flash	Via LangChain adapter
Cerebras	Llama models	OpenAI-compatible endpoint
Ollama	Any local model	No API key required

Fallback chains are built automatically: same-provider alternatives first, then cross-provider with available keys. Switch models mid-conversation. The thread's checkpoint persists regardless of which model you're talking to.

Local-First by Design

Everything lives on your machine:

What	Where
Thread metadata	`~/.openwork/openwork.sqlite`
Conversation state	`~/.openwork/threads/{threadId}.sqlite`
Tool registry	`~/.openwork/tools.yaml`
Skill definitions	`~/.openwork/skills.yaml`
Agent definitions	`~/.openwork/agents.yaml`
API keys	`~/.openwork/.env`
App settings	Electron store in `~/.openwork/`

No cloud backend. No telemetry. No account required. Bring your own API keys and you're running.

What Makes This Different

Cowork for people who actually build things

Anthropic's Cowork is a polished demo of what desktop agents could be. Noktua is the open-source, multi-provider, local-first, extensible version — built by someone who actually needs it every day.

The key differences from every other desktop AI app:

Real subagent orchestration — not just "call the API." Actual background agents with their own threads, streaming, and lifecycle management. Up to 4 running concurrently.
Browser automation built in — via Browser Use, with live streaming so you can watch the browser work. Not a screenshot. A live session.
HITL that actually works — interrupt-driven approval for dangerous operations, with the ability to edit commands before they run.
YAML-first extensibility — no plugin API to learn. Edit a file, restart, done. Tools, skills, and agents are just YAML.
Provider agnostic — use Claude for reasoning, GPT for research, a local Ollama model for quick tasks. Mix and match per agent.
Resilience as infrastructure — rate limiting, context overflow, model failures — all handled by middleware, not hope.

Tech Stack

Component	Technology
Desktop	Electron 39 + electron-vite
UI	React 19 + TypeScript + Tailwind CSS v4
State	Zustand (global) + React Context (per-thread)
Agent Runtime	LangChain `createAgent` + LangGraph
Filesystem	deepagents `LocalSandbox`
Database	sql.js (in-memory SQLite, persisted to disk)
Browser Automation	Browser Use cloud API
UI Primitives	Radix UI + class-variance-authority
Distribution	`npm install -g openwork` / `npx openwork`

Install

npm install -g openwork
openwork

Or clone and run in development:

git clone https://github.com/nof0xgiven/noktua.git
cd noktua
pnpm install
pnpm dev

References

LangChain JS — Agent creation and middleware: js.langchain.com
LangGraph — Stateful agent orchestration: LangGraph docs
openwork — Electron desktop agent framework (upstream fork): GitHub
deepagents — LangChain's deep agent runtime: GitHub
Browser Use — Cloud browser automation: browser-use.com
Xena — The webhook-first sibling architecture: Xena docs

Noktua

Layer 0 — Chat

Layer 1 — Sidebar

Layer 2 — Tabs & Artifacts

Layer 3 — YAML Registries

On this page