Skip to content

ADR-0003: Swappable agent driver layer (AgentDriver interface)

  • Status: Accepted
  • Date: 2026-05-03
  • Affected: packages/agents/, apps/web/app/api/chat/

Context

Roles need to talk to an LLM. The shape of “talk to an LLM” changed three times across V0.1–V0.3:

  1. V0.1 — pre-fetch K memories, stuff them into the user message, call Claude, stream the answer. (Stateless RAG.)
  2. V0.2 — let the model decide when to retrieve via tool use, stream + handle tool calls. (Claude Agent SDK style.)
  3. V0.2/V0.3 — same logic but over a real-time voice pipeline (Pipecat) instead of HTTP streaming.

These are three quite different code paths but the contract — given a role, history, message, and a way to search memories, produce a stream of events — is identical. We don’t want app code to know which one is running.

Options considered

Option A — Inline driver code in /api/chat

  • Pros: Simplest; no abstraction tax.
  • Cons: Swapping driver means rewriting the route. Voice version duplicates logic. No way to A/B drivers in production.

Option B — Anthropic SDK directly, parameterized

  • Pros: One library; less indirection.
  • Cons: Realtime path uses different libraries (Pipecat). Inheriting the AnthropicLLM service in Pipecat requires the same shape of abstraction we’d be avoiding. Voice + chat would duplicate role prompts and tool definitions.

Option C — AgentDriver interface + N implementations (chosen)

  • Pros: One contract, three implementations now (stateless-rag, claude-agent-sdk, voice via Pipecat composed differently). App code imports the interface; swap is a config change. Each driver optimizes for its modality without affecting the others.
  • Cons: Interface design needs to anticipate event types we don’t emit yet (tool_call, memory_used). Got this slightly wrong on the first pass — added tool_call retroactively when SDK driver shipped.

Decision

Standardize on the AgentDriver interface in packages/agents/src/ agent-interface.ts. Drivers implement chat(input): AsyncIterable<AgentEvent>. Events are: delta, memory_used, tool_call, final, error. App code only depends on the interface.

Three drivers ship in V0.2:

  • stateless-rag — V0.1 default, kept for fallback.
  • claude-agent-sdk — V0.2 default, tool use.
  • realtime-voice — V0.2 voice, lives in the Pipecat service rather than implementing the interface in TypeScript (the AsyncIterable shape doesn’t fit a long-lived bidirectional audio session).

Consequences

  • Adding a new driver (e.g. OpenAI Realtime as a fallback) is a single file under packages/agents/src/drivers/.
  • App code never imports the Anthropic SDK directly. Drivers are self-contained; swapping is a single import line.
  • Voice driver is a slight outlier — the WebSocket lifetime model doesn’t fit AsyncIterable cleanly. Voice service is the realization of the interface in spirit, not in code.
  • Adding new event types (e.g. interject for V0.3 group mode) is an additive change to the union; no existing driver breaks.

Notes

This is the “Agent layer is swappable from day one” principle from docs/00_MASTER_PLAN.md §2 #8.