- Status: Accepted
- Date: 2026-05-03
- Affected:
packages/agents/,apps/web/app/api/chat/
Context
Roles need to talk to an LLM. The shape of “talk to an LLM” changed three times across V0.1–V0.3:
- V0.1 — pre-fetch K memories, stuff them into the user message, call Claude, stream the answer. (Stateless RAG.)
- V0.2 — let the model decide when to retrieve via tool use, stream + handle tool calls. (Claude Agent SDK style.)
- V0.2/V0.3 — same logic but over a real-time voice pipeline (Pipecat) instead of HTTP streaming.
These are three quite different code paths but the contract — given a role, history, message, and a way to search memories, produce a stream of events — is identical. We don’t want app code to know which one is running.
Options considered
Option A — Inline driver code in /api/chat
- Pros: Simplest; no abstraction tax.
- Cons: Swapping driver means rewriting the route. Voice version duplicates logic. No way to A/B drivers in production.
Option B — Anthropic SDK directly, parameterized
- Pros: One library; less indirection.
- Cons: Realtime path uses different libraries (Pipecat). Inheriting the AnthropicLLM service in Pipecat requires the same shape of abstraction we’d be avoiding. Voice + chat would duplicate role prompts and tool definitions.
Option C — AgentDriver interface + N implementations (chosen)
- Pros: One contract, three implementations now (
stateless-rag,claude-agent-sdk, voice via Pipecat composed differently). App code imports the interface; swap is a config change. Each driver optimizes for its modality without affecting the others. - Cons: Interface design needs to anticipate event types we don’t
emit yet (
tool_call,memory_used). Got this slightly wrong on the first pass — addedtool_callretroactively when SDK driver shipped.
Decision
Standardize on the AgentDriver interface in packages/agents/src/ agent-interface.ts. Drivers implement
chat(input): AsyncIterable<AgentEvent>. Events are: delta,
memory_used, tool_call, final, error. App code only depends on
the interface.
Three drivers ship in V0.2:
stateless-rag— V0.1 default, kept for fallback.claude-agent-sdk— V0.2 default, tool use.realtime-voice— V0.2 voice, lives in the Pipecat service rather than implementing the interface in TypeScript (the AsyncIterable shape doesn’t fit a long-lived bidirectional audio session).
Consequences
- Adding a new driver (e.g. OpenAI Realtime as a fallback) is a
single file under
packages/agents/src/drivers/. - App code never imports the Anthropic SDK directly. Drivers are self-contained; swapping is a single import line.
- Voice driver is a slight outlier — the WebSocket lifetime model doesn’t fit AsyncIterable cleanly. Voice service is the realization of the interface in spirit, not in code.
- Adding new event types (e.g.
interjectfor V0.3 group mode) is an additive change to the union; no existing driver breaks.
Notes
This is the “Agent layer is swappable from day one” principle from
docs/00_MASTER_PLAN.md §2 #8.