nanna coder — orchestration & delegation agent

Orchestration & Delegation

Nanna is a coding agent for coding agents. Agents delegate tasks to subagents to parallelize work and to keep the main agent (the orchestrator) lean.

Nanna lets you host subagents locally, allowing you to move some of the orchestrator's work onto different machines. Nanna's subagents can run dev containers on your own machine, or in the same dev environment as the orchestrator. You can provide the orchestrator local small language models (SLMs) you host on your machine, or other models hosted through an AI Gateway. Nanna is built for cloud-based agents, but works for local coding agents like Claude Code CLI as well.

Delegation out of the orchestrator's environment allows you to use your available hardware and choose the right models to expose to your orchestrators, saving electricity and money.

Delegating to Nanna

Nanna provides an interface (CLI, MCP) so that agents can delegate work to self-hosted models.

Nanna can run inside background agents' environments, or locally. The orchestrator gets six MCP tools: assign_task, poll_task, get_result, list_tasks, cancel_task, onboard_repo. The small API is designed to be token-efficient and unambiguous for orchestrators.

The orchestrator is responsible for global task completion. If the device or platform Nanna is running on fails, the orchestrator's tool calls to Nanna will fail. But, the orchestrator has subagents it can fall back to. Nanna can thus exploit unreliable resources without compromising the reliability of the orchestrator, or scale based on resource availability: electrical grid surplus or unused personal devices.

Sequence diagram showing the orchestrator calling assign_task, poll_task, get_result

Implications for Resource Usage

SLMs use 10x to 100x less electricity than LLMs and can operate on-device. Self-hosting them brings the marginal cost to the price of electricity (1-2 orders of magnitude cheaper than token metering).

The size of the smallest models able to do meaningful work shrunk at an incredible rate from 2022 through 2024, before stabilizing through 2025. Since then, the focus shifted to specializing models for specific tasks, where they can perform as well or better than general models with 100x their resource footprint.

Providers do route tasks to smaller models, but are generally limited to their own and expose limited levers. Nanna lets you bring any model to a task, alongside a dev environment that's preconfigured for your specific work.

Aside from using less electricity, on-device work is distributed over the existing power grid. Globally, datacenters are difficult to integrate into existing grids, and rely partially on natural gas.

Line chart of the smallest model scoring above 60% on MMLU over time, showing parameter counts dropping from 540B (PaLM, 2022) to 1.7B (Qwen3-1.7B, 2025) — Models, in order: PaLM (540B), LLaMA-65B, Llama 2 34B, Mistral 7B, Phi-3-mini (3.8B), Qwen3-1.7B (FP16). Sources: Stanford AI Index 2025, Qwen3 (Apr 2025).

Bar chart comparing energy per chat query in joules — Qwen3-8B at 287 J vs Qwen3-235B reasoning (A22B thinking mode) at ~6300 J — Energy per chat query, from the ML.Energy benchmark-v3 dataset (ml.energy/leaderboard, arXiv:2505.06371).

Bar chart of 2024 Water Usage Effectiveness (L/kWh) across 22 sites — Local at 0, three Scaleway Paris DCs (DC4/DC3/DC2) under 0.01, AWS regions ranging from EU-Frankfurt at 0.01 up through US-Northern California at 0.51, AWS Tokyo at 0.91, AWS Singapore at 1.68, and AWS Jakarta at 2.75 — Water cooling usage. See AWS and Scaleway. Related paper: Li et al. 2023 (CACM) flags that 500 mL bottle covers roughly 10 – 50 medium-length GPT-3 responses. Mistral / ADEME / Carbone 4 LCA (July 2025) describes full-lifecycle LLM resource usage. Note that water usage for electricity generation also varies widely.