Noktua
A local-first Mac desktop agent that orchestrates a team of subagents — one conversation, everything handled. Like Anthropic's Cowork, but you own it.
What is Noktua?
Noktua is your AI chief of staff as a Mac app. One conversation. One interface. You tell it what you want, it figures out what to do — delegates to specialised agents, controls your desktop, browses the web, manages your files — and reports back.
It's built on top of openwork (the Electron desktop agent framework) and deepagents (LangChain's deep agent runtime) — then heavily extended with Xena-style orchestration patterns: YAML registries, subagent spawning, browser automation, middleware resilience, and the same tool routing architecture.
The pitch
People don't want 15 AI tools. They want one that does everything. Not a chatbot that answers questions — a teammate that takes action. Open Noktua, say what you need, it gets done.
Why does this exist?
Every AI desktop app is basically a chat wrapper around one model. You talk, it replies, you're still doing all the work. The model can't actually do things on your computer, can't run tasks in the background, can't follow up on its own, can't coordinate multiple jobs at once.
Noktua is different because:
- You talk to one orchestrator — Noktua itself. It decides which agents to delegate to, what tools to use, and when to come back to you.
- Subagents run in the background — up to 4 concurrent agents, each streaming in their own tab. You can watch or ignore them.
- It controls your Mac — screenshots, clipboard, file system, app control. No OAuth dance required.
- It browses the web for you — real browser automation via Browser Use, streamed live so you can see what it's doing.
- It's local-first — your data lives in
~/.openwork/. Your API keys. Your machine. No cloud backend. - Any model, any provider — Anthropic, OpenAI, Google, Cerebras, Ollama. Swap models mid-conversation.
How It Works
The Conversation Flow
You say what you need
"Reply to Sarah's email, create a Linear issue for the proposal, and research competitor pricing." One message. Noktua figures out the rest.
Noktua orchestrates
The main LangGraph runtime decomposes your request. It decides what needs subagents (long-running, independent work) versus what it can handle inline (quick file operations, simple answers). It spawns background agents via tool_execute with registry paths.
Subagents stream in tabs
Each background agent gets its own tab at the top of your chat. Click to watch it work in real-time, or ignore it. Browser automation streams live — you literally see the browser navigating.
HITL keeps you safe
Before any shell command runs, Noktua asks permission. You see exactly what it wants to execute and can approve, reject, or edit. In yolo mode? Everything auto-approves. Your call.
Results flow back
When subagents complete, their output flows back to the orchestrator. Noktua synthesises everything into a single response. Subagent cards persist in the sidebar — click to revisit any past agent run.
The Middleware Stack
Noktua's runtime isn't just "call an LLM and hope." It's a carefully ordered middleware stack that handles everything between your message and the model's response:
createAgentRuntime(threadId, modelId, workspacePath)
│
├─ 1. Model fallback → same-provider first, then cross-provider
├─ 2. Resilience → additive backoff (20s→30s→40s), 6 retries / 5 min
├─ 3. Todo list → persistent task tracking across turns
├─ 4. Filesystem backend → ls, read, write, edit, glob, grep (LocalSandbox)
├─ 5. Summarization → model-aware token/message triggers
├─ 6. Prompt caching → Anthropic cache headers
├─ 7. Tool arg normalizer → clean up model output quirks
├─ 8. Tool allowlist → only registered tools pass through
└─ 9. Human-in-the-loop → interrupt gate on executeThis order matters. Fallback wraps resilience. Resilience wraps filesystem. Summarization compresses before you hit context limits. HITL is the final gate before anything touches your system.
When context overflows, the runtime doesn't crash — it compacts messages and retries once before surfacing an error. When a provider rate-limits you, it backs off with additive delays and a bounded retry budget. When one model fails entirely, it falls through to alternatives automatically.
The Registry System
Everything in Noktua is defined in YAML files, not hardcoded. Three registries compose to create the full capability surface:
~/.openwork/tools.yaml — What Noktua can do
version: 2
tools:
- path: agent.task.spawn
description: Spawn a background subagent
required: [name, description, agent_id]
- path: browser.task.run
description: Run a browser automation task
required: [task]Tools are the capability surface. The model sees one interface: tool_execute(path, payload). The registry resolves the rest. New tools are auto-merged on startup without overwriting user customisations.
~/.openwork/skills.yaml — How Noktua should behave
Skills are system prompts for specialised agents. They encode judgment, not just API instructions. "When handling email, lead with the answer and match the sender's tone." "When researching, cite sources and flag uncertainty." Skills get injected into agent system prompts at spawn time.
~/.openwork/agents.yaml — Who does the work
Agents compose a model + tools + skill into a deployable unit:
agents:
- id: email-handler
name: Email Agent
model: anthropic:claude-sonnet-4-5-20250929
skill: communication
tools: [communication.email.reply, communication.email.send]
- id: researcher
name: Research Agent
model: openai:gpt-4o
skill: research
tools: [browser.task.run]Creating a new agent is a conversation: "I need an agent that handles my email every morning." Noktua writes the YAML.
This is the macro.micro.atomic composition pattern: agents (macro) are composed of skills (micro) and tools (atomic). Everything is a file. Everything is swappable.
Progressive Disclosure
Noktua is designed in layers. You only go as deep as you want:
Layer 0 — Chat
Just talk. Ask questions, give instructions, get results. This is all most people ever need.
Layer 1 — Sidebar
See your threads, active agents, files, and tasks. Click into any subagent to watch it work.
Layer 2 — Tabs & Artifacts
Subagent streams, file previews, code artifacts, browser automation sessions. Full visibility into what's happening.
Layer 3 — YAML Registries
Edit tools.yaml, skills.yaml, agents.yaml directly. Create custom agents, define new tools, tune behaviour. Power user territory.
Multi-Provider Intelligence
Noktua isn't locked to one AI provider. Models are fetched dynamically from provider APIs on launch:
| Provider | Models | Notes |
|---|---|---|
| Anthropic | Claude Opus 4.5, Sonnet 4.5, Haiku 4.5 | Prompt caching middleware enabled |
| OpenAI | GPT-5.x, o3, o4 Mini, GPT-4.1 | Function calling + streaming |
| Gemini 3 Pro/Flash, 2.5 Pro/Flash | Via LangChain adapter | |
| Cerebras | Llama models | OpenAI-compatible endpoint |
| Ollama | Any local model | No API key required |
Fallback chains are built automatically: same-provider alternatives first, then cross-provider with available keys. Switch models mid-conversation. The thread's checkpoint persists regardless of which model you're talking to.
Local-First by Design
Everything lives on your machine:
| What | Where |
|---|---|
| Thread metadata | ~/.openwork/openwork.sqlite |
| Conversation state | ~/.openwork/threads/{threadId}.sqlite |
| Tool registry | ~/.openwork/tools.yaml |
| Skill definitions | ~/.openwork/skills.yaml |
| Agent definitions | ~/.openwork/agents.yaml |
| API keys | ~/.openwork/.env |
| App settings | Electron store in ~/.openwork/ |
No cloud backend. No telemetry. No account required. Bring your own API keys and you're running.
What Makes This Different
Cowork for people who actually build things
Anthropic's Cowork is a polished demo of what desktop agents could be. Noktua is the open-source, multi-provider, local-first, extensible version — built by someone who actually needs it every day.
The key differences from every other desktop AI app:
- Real subagent orchestration — not just "call the API." Actual background agents with their own threads, streaming, and lifecycle management. Up to 4 running concurrently.
- Browser automation built in — via Browser Use, with live streaming so you can watch the browser work. Not a screenshot. A live session.
- HITL that actually works — interrupt-driven approval for dangerous operations, with the ability to edit commands before they run.
- YAML-first extensibility — no plugin API to learn. Edit a file, restart, done. Tools, skills, and agents are just YAML.
- Provider agnostic — use Claude for reasoning, GPT for research, a local Ollama model for quick tasks. Mix and match per agent.
- Resilience as infrastructure — rate limiting, context overflow, model failures — all handled by middleware, not hope.
Tech Stack
| Component | Technology |
|---|---|
| Desktop | Electron 39 + electron-vite |
| UI | React 19 + TypeScript + Tailwind CSS v4 |
| State | Zustand (global) + React Context (per-thread) |
| Agent Runtime | LangChain createAgent + LangGraph |
| Filesystem | deepagents LocalSandbox |
| Database | sql.js (in-memory SQLite, persisted to disk) |
| Browser Automation | Browser Use cloud API |
| UI Primitives | Radix UI + class-variance-authority |
| Distribution | npm install -g openwork / npx openwork |
Install
npm install -g openwork
openworkOr clone and run in development:
git clone https://github.com/nof0xgiven/noktua.git
cd noktua
pnpm install
pnpm devReferences
- LangChain JS — Agent creation and middleware: js.langchain.com
- LangGraph — Stateful agent orchestration: LangGraph docs
- openwork — Electron desktop agent framework (upstream fork): GitHub
- deepagents — LangChain's deep agent runtime: GitHub
- Browser Use — Cloud browser automation: browser-use.com
- Xena — The webhook-first sibling architecture: Xena docs
Deep
A production terminal AI coding agent built on LangChain's deepagents — adding the TUI, auth, model switching, extensions, background subagents, and everything else you need to actually use it.
Warp Engine
Event-sourced context management for AI coding agents. Instead of growing conversation history, Warp Engine records every action as an event and assembles fresh, deterministic context for every model call.