ADR-0003: Swappable agent driver layer (AgentDriver interface)

Status: Accepted
Date: 2026-05-03
Affected: packages/agents/, apps/web/app/api/chat/

Context

Roles need to talk to an LLM. The shape of “talk to an LLM” changed three times across V0.1–V0.3:

V0.1 — pre-fetch K memories, stuff them into the user message, call Claude, stream the answer. (Stateless RAG.)
V0.2 — let the model decide when to retrieve via tool use, stream + handle tool calls. (Claude Agent SDK style.)
V0.2/V0.3 — same logic but over a real-time voice pipeline (Pipecat) instead of HTTP streaming.

These are three quite different code paths but the contract — given a role, history, message, and a way to search memories, produce a stream of events — is identical. We don’t want app code to know which one is running.

Options considered

Option A — Inline driver code in `/api/chat`

Pros: Simplest; no abstraction tax.
Cons: Swapping driver means rewriting the route. Voice version duplicates logic. No way to A/B drivers in production.

Option B — Anthropic SDK directly, parameterized

Pros: One library; less indirection.
Cons: Realtime path uses different libraries (Pipecat). Inheriting the AnthropicLLM service in Pipecat requires the same shape of abstraction we’d be avoiding. Voice + chat would duplicate role prompts and tool definitions.

Option C — `AgentDriver` interface + N implementations (chosen)

Pros: One contract, three implementations now (stateless-rag, claude-agent-sdk, voice via Pipecat composed differently). App code imports the interface; swap is a config change. Each driver optimizes for its modality without affecting the others.
Cons: Interface design needs to anticipate event types we don’t emit yet (tool_call, memory_used). Got this slightly wrong on the first pass — added tool_call retroactively when SDK driver shipped.

Decision

Standardize on the AgentDriver interface in packages/agents/src/ agent-interface.ts. Drivers implement chat(input): AsyncIterable<AgentEvent>. Events are: delta, memory_used, tool_call, final, error. App code only depends on the interface.

Three drivers ship in V0.2:

stateless-rag — V0.1 default, kept for fallback.
claude-agent-sdk — V0.2 default, tool use.
realtime-voice — V0.2 voice, lives in the Pipecat service rather than implementing the interface in TypeScript (the AsyncIterable shape doesn’t fit a long-lived bidirectional audio session).

Consequences

Adding a new driver (e.g. OpenAI Realtime as a fallback) is a single file under packages/agents/src/drivers/.
App code never imports the Anthropic SDK directly. Drivers are self-contained; swapping is a single import line.
Voice driver is a slight outlier — the WebSocket lifetime model doesn’t fit AsyncIterable cleanly. Voice service is the realization of the interface in spirit, not in code.
Adding new event types (e.g. interject for V0.3 group mode) is an additive change to the union; no existing driver breaks.

Notes

This is the “Agent layer is swappable from day one” principle from docs/00_MASTER_PLAN.md §2 #8.

ADR-0003: Swappable agent driver layer (AgentDriver interface)

Context

Options considered

Option A — Inline driver code in `/api/chat`

Option B — Anthropic SDK directly, parameterized

Option C — `AgentDriver` interface + N implementations (chosen)

Decision

Consequences

Notes

Plans

Operations

Decisions (ADRs)

Discussions

ADR-0003: Swappable agent driver layer (AgentDriver interface)

Context

Options considered

Option A — Inline driver code in /api/chat

Option B — Anthropic SDK directly, parameterized

Option C — AgentDriver interface + N implementations (chosen)

Decision

Consequences

Notes

Plans

Operations

Decisions (ADRs)

Discussions

Option A — Inline driver code in `/api/chat`

Option C — `AgentDriver` interface + N implementations (chosen)