ARCIVE — Software Plan

Companion to 00_MASTER_PLAN.md and 02_HARDWARE_PLAN.md. This document is the source of truth for everything that runs on a server, in a browser, or on a phone.

Software and hardware are co-equal, parallel tracks. Software ships standalone (phone-mic) at every phase, but is built to integrate with the device from Phase 0. The hardware↔software contract in 00_MASTER_PLAN §6 is frozen at the start of each phase; both teams build to it.

1. Stack — Final Decisions

Concern	Choice	Notes
Web app framework	Next.js 15 (App Router) on Vercel	RSC, edge runtime, ISR, free hobby tier
Mobile app	Expo SDK 52+ with Expo Router	iOS + Android from one codebase, EAS for builds & OTA
Backend	Supabase	Postgres + Auth + Storage + Realtime + Edge Functions
Vector store	pgvector with HNSW	Same DB; no Pinecone/Weaviate needed
Transcription (V0, batch)	Groq Whisper-large-v3-turbo	~$0.04/hr, faster than realtime but batch-only (POST complete file → get complete transcript). Fits V0 chunked-upload model perfectly.
Transcription (V0.1+, streaming)	Deepgram Nova-3	True WebSocket streaming + live diarization, ~$0.26/hr. Used wherever live transcript or sub-second latency is required (voice talk-back, group mode, live captions).
Speaker re-ID	Pyannote.audio on Modal	Cross-session identity, owned voice embeddings
Embeddings	Voyage-3-lite	Cheap, top retrieval benchmarks
Summary / topic extraction	Gemini 2.5 Flash or Claude Haiku 4.5	Cents per recording
Agent framework	Claude Agent SDK	Tool use, memory tool, role system prompts
Voice talk-back (V0.2+)	Pipecat + Cartesia Sonic TTS + Deepgram streaming STT	Sub-second turn latency
Group conversation media	LiveKit Cloud	WebRTC, used by OpenAI Realtime, free tier
Pipeline orchestration	pgmq + pg_cron (V0.1) → Inngest (V0.2+) if complexity demands	Queue-based, durable, retryable
Auth	Supabase Auth + magic links	No passwords
Payments	Stripe Billing + RevenueCat for mobile	One SDK across iOS/Android/web
Analytics	PostHog	Product analytics + session replay + feature flags
Errors	Sentry	Web + mobile + Edge Functions
Search (text)	Postgres FTS (`tsvector`)	No Algolia needed
Search (semantic)	pgvector HNSW	Same query layer
File storage	Supabase Storage	Audio + exports
Code structure	pnpm workspace monorepo	`apps/web`, `apps/mobile`, `packages/db`, `packages/shared`, `packages/agents`
Type safety	TypeScript everywhere + Zod for runtime	Generated types from Supabase schema
ORM	Drizzle (when raw SQL gets painful)	Type-safe, lightweight
Testing	Vitest + Playwright for web, Maestro for mobile	Skip what’s not high-value
CI/CD	GitHub Actions + Vercel previews + EAS Update	OTA mobile fixes without store review

1.4. AI Architecture — Layered (canonical: ADR-0011)

ARCIVE composes specialized AI capabilities around a memory store. The product moat is the memory store + retrieval + auto-correlation, not any AI capability. The §1 “Stack” table above lists individual choices per concern; this section is how those choices fit together as a system, and which classes drive procurement.

Three layer classes:

ARCIVE-owned — built internally; never outsourced. The moat.
Strategic AI — best-in-class on a strategic axis (e.g. voice fidelity per ADR-0006, agent quality per ADR-0003). Vendor swaps require their own ADR.
Commodity AI — cheapest acceptable vendor with a coded fallback. Vendor swaps within this class do not require a new ADR. They happen as the market shifts.

The 11 layers:

#	Layer	Class	Today’s vendor
1	Capture (mic, transcode, ingest)	ARCIVE-owned	recorders + audio-transcode worker
2	Transcribe (audio → text + timestamps)	commodity	Groq Whisper
3	Understand (text → summary, topics, entities)	commodity	Gemini Flash → Anthropic Haiku → Groq Llama (fallback chain)
4	Embed (text/image → vector)	commodity	Voyage-3-lite
5	Retrieve (query → ranked memories)	ARCIVE-owned (MOAT)	MCP server + pgvector HNSW + tsv + edges + consent gate
6	Reason (chat with memories as context)	semi-strategic	Claude Agent SDK
7	Hear (real-time STT for voice loop)	commodity	Deepgram Nova-3 (paused per ADR-0010)
8	Speak (TTS)	strategic (ADR-0006)	Cartesia → ElevenLabs v3 → Sesame CSM (paused per ADR-0010)
9	Voice orchestration (real-time turn management)	strategic (ADR-0002, paused per ADR-0010)	Pipecat (paused) — composed on resumption, never speech-to-speech
10	Auto-correlate (cross-source memory linkage)	ARCIVE-owned (MOAT)	edges job; future per `discussions/2026-05-04_multimodal_expansion.md`
11	Surfaces (PWA, mobile, MCP-as-output, future email-in / share-target)	ARCIVE-owned	web + mobile + MCP server

ARCIVE never consolidates to a single vendor across the stack. This matches how every credible adjacent player (Notion, Granola, Otter, Limitless, Apple Intelligence, ElevenLabs Conversational AI) actually operates. Full reasoning, cost projections at ARCIVE volumes, and the rejected alternatives are in ADR-0011 and the working session notes discussions/2026-05-05_ai_strategy_architecture.md.

Stage 6 (unified “talk + chat together”) is redefined: ships as two surfaces (Talk + Chat) sharing the memory store, not as a single-model unified architecture (which would force ARCIVE to abandon ADR-0006 voice fidelity). Same data, different input modalities.

1.5. Capture-Surface Capabilities (Honest Truth Table)

Different input surfaces have different capabilities. We do NOT promise capabilities a surface can’t deliver. This table is canonical — UX copy must reflect it.

Capability	ARCIVE Device	Mobile App (native, Expo)	Web App / PWA
Always-on capture (records when not in foreground)	✅	✅ (with foreground service / background audio mode)	❌ — browsers suspend tabs after lock/switch
Press-to-record (intentional dictation)	✅	✅	✅
Recording while phone screen is off / locked	✅	✅ (Android foreground service; iOS background audio capability)	❌
Recording survives WiFi outage	✅ (30-min local buffer)	✅ (SQLite/file queue)	⚠️ (IndexedDB queue, lost if user closes tab)
Multi-speaker far-field capture	✅ (4-mic array, 5m)	❌ (single phone mic, ~1m)	❌
DoA / speaker positioning metadata	✅	❌	❌
Onboard VAD (silence not uploaded)	✅ (XVF3800 hardware)	✅ Silero VAD via `onnxruntime-react-native` or `@picovoice/cobra-react-native` (decision at V0.2 implementation time)	✅ (`@ricky0123/vad-web`, AudioWorklet, Silero ONNX)
Opus encoding	✅ (firmware)	✅ (native codec)	⚠️ Chrome ✅, Safari often falls back to MP4/AAC
Group mode (continuous WebRTC stream)	✅ (Phase 3+)	✅ (Phase 3+)	✅ for joining group sessions, not as primary capturer
BLE pairing with ARCIVE device	n/a	✅ (`react-native-ble-plx`)	⚠️ Web Bluetooth: Chrome Android ✅, Safari iOS ❌
Voice talk-back (real-time)	✅ via paired phone	✅	✅ (best when tab is foregrounded)
Push notifications	n/a	✅	⚠️ iOS 16.4+ only when installed to home screen
Install to home screen	n/a	✅ App Store / Play Store	✅ PWA (see §1.6)
Offline browsing of past memories	n/a	✅ (SQLite cache)	✅ (Service Worker + IndexedDB)

Implications

The web/PWA is for intentional dictation and review. UX copy says “Tap to record.” It does NOT say “ARCIVE listens to your day.” That promise belongs to the device and the native mobile app.
The mobile app is the always-on phone-only capture surface (when no device is paired). Foreground service on Android, background audio mode on iOS — both legitimate platform features, both require explicit permission disclosure.
The ARCIVE device is the unrestricted always-on capture surface. No OS in the way.
We never promise background recording on the web.

iOS background-mode footnote

Declaring UIBackgroundModes: ["audio", "bluetooth-central"] in Info.plist is necessary but not sufficient. Always-on capture + always-paired BLE on iOS requires:

An active audio session (category playAndRecord, mixWithOthers) at all times capture is expected — going idle risks suspension.
CBCentralManager state restoration implemented (CBCentralManagerOptionRestoreIdentifierKey + centralManager(_:willRestoreState:)) so iOS can re-wake the app and reconnect to the device after suspension.
bluetooth-central background mode declared in addition to audio mode — they are independent capabilities. Without state restoration, iOS will silently drop the BLE connection during suspension and not re-pair until the user reopens the app.

1.6. How the PWA Works

The “PWA” in “Next.js PWA” is not a separate framework — it’s a set of browser standards layered onto a normal website. Three pieces, simple individually, powerful together.

Piece 1 — `manifest.webmanifest` (the install metadata)

A JSON file at apps/web/public/manifest.webmanifest. Tells the browser “this site is installable, here’s its name, icon, color, start URL, and orientation.”

{
  "name": "ARCIVE",
  "short_name": "ARCIVE",
  "description": "Record, retrieve, interact, and create memories.",
  "start_url": "/today",
  "display": "standalone",
  "background_color": "#F8F6F1",
  "theme_color": "#88B4A0",
  "icons": [
    { "src": "/icons/192.png", "sizes": "192x192", "type": "image/png" },
    { "src": "/icons/512.png", "sizes": "512x512", "type": "image/png", "purpose": "maskable" }
  ]
}

Linked from <head>:

<link rel="manifest" href="/manifest.webmanifest" />
<meta name="theme-color" content="#88B4A0" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="default" />

When the user visits the site on Chrome Android or Safari iOS, the browser detects this manifest and offers an “Add to Home Screen” prompt (Android) or the user manually adds it from the Share menu (iOS). Once added, ARCIVE launches in standalone mode — no browser chrome, no URL bar, looks identical to a native app.

Piece 2 — Service Worker (the offline + caching brain)

A JavaScript file (apps/web/public/sw.js) the browser registers in the background. It intercepts network requests and can serve cached responses, queue failed requests for retry, handle push notifications, and run independently of any open tab.

For ARCIVE we use it for three things:

Offline shell — cache the app’s HTML / JS / CSS so the UI loads instantly even with no network. Memories themselves come from API; if API fails, show cached list.
Failed-upload retry queue — if the user records a memory and the upload fails (no WiFi), the service worker keeps the chunk in IndexedDB and retries when the network comes back, even if the tab is closed.
Web push notifications (iOS 16.4+, all of Android) — server can push “Your memory is processed” without the app being open.

We don’t write the service worker by hand — we use @serwist/next (the maintained successor to next-pwa) which generates one from config. Setup is a single Next.js plugin.

Piece 3 — Web APIs the PWA uses

Once installed, the PWA can use the same browser APIs as a normal website — but UX is fullscreen, app-like:

getUserMedia for mic
MediaRecorder for capture
AudioWorklet for VAD
Web Bluetooth for device pairing (Android only)
IndexedDB for local cache
Web Push for notifications

What the user experiences

Visits arcive.app in Chrome Android or Safari iOS.
Browser shows “Add ARCIVE to Home Screen” (or user does it manually on iOS).
Icon appears on home screen, indistinguishable from a native app.
Tap → app launches fullscreen (no browser chrome).
Works offline — past memories visible, new recordings queued for upload.
Push notifications arrive (iOS 16.4+, all Android).
Updates ship instantly — next time the app opens, it grabs the latest version. No App Store review.

What the PWA still cannot do (so we don’t pretend)

Limitation	Workaround
No background audio recording on iOS	Use the native mobile app for that
No Web Bluetooth on Safari iOS	Pairing flow goes through native mobile app on iOS
Push notifications iOS-limited	Acceptable for V0; native app at V0.2 fixes. iOS Web Push prerequisites (all required): PWA installed to home screen via Safari, manifest has `display: standalone` or `fullscreen`, VAPID keys configured server-side, permission prompt fired from a user-gesture event handler (not on page load), not in private browsing.
Full filesystem access	Not needed for our use case
Speech recognition (`SpeechRecognition`) Safari-spotty	We do STT server-side anyway

Why PWA-first beats native-first for V0

Ship a URL in 3 weeks, not 6. No App Store review, no provisioning profiles, no $99/yr Apple Developer fee for V0.
One codebase across desktop, mobile web, installed-PWA. Native (Expo) comes at V0.2 once the pattern is proven.
Iteration speed. Push a fix, refresh the page, done. No EAS build queue, no TestFlight invites.
Marketing site is the same Next.js app. No second codebase to maintain.

The PWA is genuinely the right V0 surface. The native mobile app at V0.2 then unlocks always-on capture and full BLE — the one thing the PWA can’t do.

Implementation cost

Adding PWA support to a Next.js 15 app is roughly:

1 hour: install @serwist/next, configure
1 hour: write manifest.webmanifest, generate icons (192, 512, maskable)
1 hour: add iOS-specific <meta> tags, test “Add to Home Screen” flow on real devices
2 hours: write IndexedDB upload-queue logic for failed uploads
1 hour: test offline shell loads when network is off

Half a day’s work for a real PWA. Worth it from V0.

1.7. Hosting Strategy

Three things to host. Each has a clear best home.

What needs hosting

Thing	Recommended home	Alternatives
Next.js web app	Vercel for V0–V0.2; Cloudflare Pages for V0.2+ if cost matters	Netlify, self-hosted on VPS, Fly.io
Backend (functions, DB, auth, storage, realtime)	Supabase (locked)	—
GPU / heavy workers (Pyannote, future on-device-grade STT)	Modal	Replicate, Fly Machines (GPU), self-host
MCP server (Phase 4+)	Cloudflare Workers (cheap, edge, always-on)	Vercel Edge Functions, Fly
Firmware OTA binaries	Supabase Storage	Any object storage

Hosted-Supabase-only development

We do not run a local Supabase stack. No Docker, no local Postgres container — every developer points at the same hosted project (or their own personal one) over HTTPS. Trade-off: every migration / Edge Function change is a supabase db push / functions deploy, not an in-process reload. Win: zero Docker dependency, zero LAN-IP gymnastics when testing on a real phone or device, identical environment between laptop and CI.

# One-time install
npm install -g supabase                    # CLI for `link`, `db push`, `functions deploy` — no Docker needed
brew install pnpm                          # or: npm install -g pnpm

# Bootstrap
git clone <repo>
pnpm install

# Link to a hosted Supabase project (yours or the team's)
supabase login
supabase link --project-ref <PROJECT_REF>
supabase db push
pnpm functions:deploy

# Fill apps/web/.env.local with the hosted project's URL + anon key
cp .env.example apps/web/.env.local

# Run
pnpm --filter web dev                      # Next.js on localhost:3030 (project standard — not Next default 3000)

# When testing the mobile app
pnpm --filter mobile start                 # phone hits the same hosted project; no tunnel, no LAN IP

# When testing the device (Phase 0+)
# Firmware uploads directly to https://<project-ref>.supabase.co/functions/v1/ingest-audio

Hosting decision matrix

Provider	Best for	V0 cost	V0.2 cost	Setup time	Verdict
Vercel	Next.js	$0 (hobby)	$20–50 (Pro)	5 min	Default. Use this for V0.
Cloudflare Pages + Workers	Next.js at scale	$0	$5–20	30 min	Migrate here at V0.2 if Vercel bill matters.
Netlify	Next.js	$0	$19+	10 min	Fine, but no advantage over Vercel or Cloudflare
Hostinger / VPS Node.js	Cheapest fixed cost	~$4	~$4–10	1–2 days	Skip unless you have a strong reason
Self-host on Hetzner / DigitalOcean	Full control	~$5	~$10	1–2 days	Skip for V0; consider for local-only variant (V2.x)
Fly.io	Container apps	$0–5	$5–20	1 hr	Better for backend than for Next.js
Railway / Render	Simple PaaS	$5–10	$10–30	30 min	OK middle ground; no real edge

Recommendation

Local-first for the first 1–2 weeks of V0. No hosting yet.
Deploy V0 to Vercel. Free hobby tier, 5-minute setup, PR previews, optimized for Next.js. Don’t reconsider until the bill is meaningful (>$50/mo).
At V0.2, evaluate Cloudflare Pages. Same Next.js code, much cheaper at scale, generous Workers free tier (100k req/day). Migration is 1–2 days.
Backend stays on Supabase regardless of where the web app lives. Don’t conflate.
Long-running Pyannote workers go to Modal (pay-per-second GPU, no always-on cost).
Skip cheap VPS hosts (Hostinger, DigitalOcean droplets, Hetzner) for V0. Saving $4/mo is not worth losing 2 days of engineering. Reconsider only when:
- You’re at 100k+ users and Vercel bill > $500/mo
- You have data-residency requirements (EU/UK/India local hosting)
- You’re building the local-only ARCIVE variant (V2.x) — that genuinely needs self-hosted or device-as-server architecture

Why not Hostinger / cheap VPS for V0

No CDN — slow first paint for users far from the server
No edge runtime — every request hits one region
No image optimization — bigger pages, slower loads
No PR previews — slows iteration
No zero-config Next.js — you maintain Node, PM2, nginx, SSL, deploy scripts
~1–2 days of setup vs. 5 minutes
Saves ~$15/mo, costs days of engineering you don’t have at V0

Engineering principle

At V0, optimize for engineering speed, not infrastructure cost. The bill is small at the scale where speed matters most. Cost optimization is a Phase 2+ exercise.

1.8. Latency & Streaming Architecture

This section is the source of truth for what streams where, what’s batch, and what the latency budget is at every realtime touchpoint. It corrects three things that were vague or wrong in earlier sections.

Streaming vs. batch — the two pipelines

ARCIVE has two parallel data pipelines, not one. They use different transcription providers because they have different latency requirements.

PIPELINE A — INGESTION (batch, V0+)
   capture surface ──► chunk to 30s + Opus mono @ 24kbps ──► HTTPS POST /ingest-audio
                                                       │
                                                       ▼
                                            Groq Whisper-large-v3-turbo (batch)
                                                       │
                                                       ▼
                                            queue: diarize → embed → summarize → edges
                                                       │
                                                       ▼
                                            Supabase Realtime → app shows new memory
   Use: dictation, hardware capture, mobile background capture, group session post-processing
   Total user-visible latency: ~2–5s after chunk seal


PIPELINE B — LIVE (streaming, V0.2+)
   capture surface ──► WebSocket / WebRTC ──► Deepgram Nova-3 streaming
                                                       │
                                       partial transcripts (~200ms)
                                                       │
                                                       ▼
                                            Pipecat orchestrator
                                                       │
                                                       ▼
                                            Claude Agent SDK (Haiku 4.5, streaming)
                                                       │
                                                       ▼
                                            Cartesia Sonic TTS (sub-100ms first chunk)
                                                       │
                                                       ▼
                                            audio back to user
   Use: voice talk-back, role-play conversations, group mode interjections, live captions
   Total user-visible latency budget: <1.5s end-of-utterance → AI starts speaking

Latency budget (Pipeline B, voice talk-back)

Stage	Component	Target	Realistic
End-of-utterance detection	Pipecat + Deepgram VAD	200ms	200–400ms
STT final result	Deepgram Nova-3 streaming	+100ms after EOU	100–300ms
LLM first token	Claude Haiku 4.5 streaming	+300ms	300–500ms
TTS first audio chunk	Cartesia Sonic	+100ms	90–150ms
Network RTT	variable	50ms	50–200ms
Total		~750ms ideal	~750ms–1.55s realistic

Phase 2 ships with explicit latency benchmarking against a 1500ms hard ceiling. If we exceed it, options: co-locate Deepgram + Cartesia in the same region as the user, switch to Cartesia’s edge endpoints, or use a smaller LLM (Haiku → 7B model on Groq).

Why Pipecat over OpenAI Realtime / Gemini Live

Both OpenAI Realtime API and Gemini Live offer bundled STT+LLM+TTS in a single WebSocket. Tempting, but rejected because:

Vendor lock: you cannot swap STT, LLM, or TTS independently. If Cartesia ships a better voice or Deepgram beats them on latency, you can’t take advantage.
Cost: ~$0.06/min for OpenAI Realtime vs. ~$0.02/min for Pipecat-orchestrated Deepgram + Haiku + Cartesia.
Tool use parity: Pipecat with Claude Agent SDK has more mature tool-calling for our memory retrieval tools.
Group mode: LiveKit Agents (which Pipecat composes with cleanly) is the proven pattern for multi-party WebRTC. OpenAI Realtime has weaker multi-party support.

We keep OpenAI Realtime as a fallback driver in the swappable agent layer — packages/agents/drivers/openai-realtime.ts exists as an option but is not the default.

Realtime / streaming coverage matrix

Capability	Web (V0)	Mobile (V0.2)	Device (V0)	Device (V0.3 group)
Continuous mic capture (foreground)	✅ getUserMedia	✅ expo-audio	✅ XVF3800 → I2S	✅
Continuous mic capture (background)	❌ tab suspends	✅ background-audio mode	✅ always	✅ always
On-device VAD	✅ Silero in AudioWorklet	✅ Silero ONNX or Cobra	✅ XVF3800 hardware	✅ XVF3800 hardware
Chunked HTTPS upload	✅ fetch + ReadableStream	✅ fetch	✅ esp_https_client	✅
Resumable upload on network drop	✅ IndexedDB queue	✅ SQLite queue	✅ 30-min circular buffer	✅
Streaming STT (live transcript)	✅ Deepgram WS from V0.1	✅ from V0.2	n/a (server-side after upload)	✅ via backend bridge
Voice talk-back loop (Pipecat)	✅ from V0.2	✅ from V0.2	✅ via paired phone playback	✅
Group mode WebRTC	✅ LiveKit JS SDK from V0.3	✅ LiveKit RN SDK from V0.3	⚠️ HTTPS chunked → backend bridge	⚠️ same

Validated stack capabilities

Component	Streams?	Verified
Supabase Edge Functions (Deno)	✅ `Request.body` is `ReadableStream`	Used in production by audio products
Supabase Storage	✅ resumable uploads via tus	Native support
Supabase Realtime	✅ WebSocket, sub-100ms push	Native
Groq Whisper-large-v3-turbo	❌ batch only	Confirmed by Groq API docs
Deepgram Nova-3	✅ WebSocket streaming + live diarization	Confirmed
Claude Agent SDK	✅ streaming responses + tool use	Native
Cartesia Sonic TTS	✅ <100ms first-byte	Confirmed by published benchmarks
Pipecat	✅ provider-agnostic STT→LLM→TTS pipeline	Active OSS, used by many
LiveKit Cloud + Agents	✅ multi-party WebRTC + server agent participants	Used by OpenAI Realtime, Character.ai
pgvector HNSW	✅ sub-50ms similarity at our scale	Standard
`@ricky0123/vad-web`	✅ Silero VAD in AudioWorklet	Standard
MCP server	✅ JSON-RPC over stdio/HTTP, supports streaming responses	Confirmed

2. Repository Layout (software-only)

arcive/
├── apps/
│   ├── web/                          # Next.js 15 PWA
│   │   ├── app/
│   │   │   ├── (auth)/login
│   │   │   ├── (app)/today
│   │   │   ├── (app)/universe
│   │   │   ├── (app)/memory/[id]
│   │   │   ├── (app)/people
│   │   │   ├── (app)/roles
│   │   │   └── (app)/settings
│   │   ├── components/
│   │   ├── lib/
│   │   └── public/manifest.webmanifest
│   │
│   └── mobile/                       # Expo (V0.2+)
│       ├── app/(tabs)/today.tsx
│       ├── app/(tabs)/universe.tsx
│       ├── app/memory/[id].tsx
│       ├── app/pair-device.tsx       # V0.3+
│       └── lib/ble.ts                # V0.3+
│
├── packages/
│   ├── db/
│   │   ├── migrations/               # Supabase migrations
│   │   ├── schema.ts                 # Drizzle schema
│   │   └── types.ts                  # Generated
│   │
│   ├── shared/
│   │   ├── zod/                      # Validation schemas (Memory, Person, Role)
│   │   ├── ble-uuids.ts              # ← shared with firmware (HW Plan §6)
│   │   ├── agent-interface.ts        # AgentSession contract
│   │   └── api-contracts.ts          # Edge Function I/O types
│   │
│   └── agents/
│       ├── roles/                    # Built-in role definitions
│       │   ├── reviewer.ts
│       │   ├── tutor.ts
│       │   ├── caregiver.ts
│       │   └── brainstorm.ts
│       ├── tools/
│       │   ├── memory-search.ts
│       │   ├── person-lookup.ts
│       │   └── timeline-window.ts
│       └── drivers/
│           ├── stateless-rag.ts      # V0.1
│           ├── claude-agent-sdk.ts   # V0.2
│           └── realtime-voice.ts     # V0.2 (voice) / V0.3 (group)
│
├── backend/
│   ├── functions/                    # Supabase Edge Functions
│   │   ├── ingest-audio/             # Hardware + app upload entrypoint
│   │   ├── transcribe-step/          # Queue worker
│   │   ├── embed-step/
│   │   ├── identify-speakers-step/
│   │   ├── summarize-step/
│   │   ├── compute-edges-step/
│   │   ├── pair-device/              # V0.3+
│   │   └── revoke-device/
│   │
│   ├── workers/                      # Long-running workers (Modal/Fly)
│   │   └── pyannote-reid/            # Speaker re-ID
│   │
│   └── mcp/                          # MCP server (V0.3+ internal, V1 public)
│       └── arcive-memory-mcp/
│
└── shared/
    └── ble-uuids.ts                  # Mirrored to firmware repo

3. Database Schema (V0 — full)

-- Vector extension
create extension if not exists vector;

-- Subscription enum
create type subscription_tier as enum ('free', 'pro', 'family', 'enterprise');

-- Users handled by Supabase Auth; supplemental table for app-level fields
create table user_profiles (
  id uuid primary key references auth.users,
  display_name text,
  subscription_tier subscription_tier default 'free',
  stripe_customer_id text unique,
  monthly_seconds_used int default 0,
  monthly_seconds_reset_at timestamptz,
  consent_granted_at timestamptz,
  created_at timestamptz default now()
);

-- People in the user's life (including "self")
create table people (
  id uuid primary key default gen_random_uuid(),
  user_id uuid references auth.users not null,
  display_name text not null,                -- "Self", "Mom", "Dr. Singh"
  voice_embedding vector(192),               -- Pyannote embedding (V0.1+)
  relationship text,                         -- "self" | "family" | "friend" | "professional"
  notes text,                                -- user-editable context for the agent
  consent_status text default 'pending',     -- "granted" | "pending" | "revoked"
  created_at timestamptz default now()
);

-- Devices (hardware + phones; designed for full variant lineup per Master Plan §2.5)
create type device_kind as enum (
  'phone_ios', 'phone_android', 'web', 'watch_apple', 'watch_wearos',
  'arcive_clip', 'arcive_pendant', 'arcive_tabletop',
  'arcive_card', 'arcive_screen', 'arcive_cellular', 'arcive_local'
);

create table devices (
  id uuid primary key default gen_random_uuid(),
  user_id uuid references auth.users not null,
  kind device_kind not null,
  name text,
  mac_address text unique,                   -- null for phones/web
  firmware_version text,
  capabilities jsonb,                        -- {has_mic_array, has_screen, has_cellular, has_local_compute, mic_count, ...}
  connectivity text[],                       -- ["wifi", "ble"] | ["cellular", "ble"] | ["local_only"]
  cellular_iccid text,                       -- variant-specific, null otherwise
  local_mode boolean default false,          -- if true, device does on-device STT/embed and uploads only summaries
  paired_at timestamptz,
  revoked_at timestamptz,
  created_at timestamptz default now()
);

-- Raw recordings
create table recordings (
  id uuid primary key default gen_random_uuid(),
  device_id uuid references devices,
  user_id uuid references auth.users not null,
  storage_path text not null,
  duration_seconds int,
  recorded_at timestamptz not null,
  doa_metadata jsonb,                        -- HW only
  status text default 'pending',             -- pending | processing | done | error
  error_message text,
  created_at timestamptz default now()
);

-- Processed memories
create table memories (
  id uuid primary key default gen_random_uuid(),
  recording_id uuid references recordings unique,
  user_id uuid references auth.users not null,
  transcript text,
  transcript_tsv tsvector generated always as (to_tsvector('english', coalesce(transcript, ''))) stored,
  summary text,
  topics text[],
  embedding vector(512),                     -- Voyage-3-lite is 512-dim
  recorded_at timestamptz,
  created_at timestamptz default now()
);

create index memories_tsv_idx on memories using gin(transcript_tsv);
create index memories_embedding_idx on memories using hnsw (embedding vector_cosine_ops);

-- Speaker segments per memory
create table memory_participants (
  id uuid primary key default gen_random_uuid(),
  memory_id uuid references memories on delete cascade,
  person_id uuid references people,          -- nullable until re-ID resolves
  speaker_label text,                        -- "Speaker A", "Speaker B" from diarization
  speaking_time_seconds int,
  segments jsonb                             -- [{start_s, end_s, text}]
);

-- Semantic edges between memories (V0.1+)
create table memory_edges (
  id uuid primary key default gen_random_uuid(),
  memory_a uuid references memories on delete cascade,
  memory_b uuid references memories on delete cascade,
  similarity float not null,
  created_at timestamptz default now(),
  unique(memory_a, memory_b)
);

-- AI Roles (built-in + user-created + marketplace)
create table roles (
  id uuid primary key default gen_random_uuid(),
  user_id uuid references auth.users,        -- null for built-in
  name text not null,
  description text,
  system_prompt text not null,
  voice_id text,                             -- Cartesia/ElevenLabs voice
  retrieval_config jsonb,                    -- {window_days, person_filter, top_k, ...}
  guardrails jsonb,                          -- {avoid_topics, escalation_triggers}
  is_premium boolean default false,
  is_published boolean default false,        -- marketplace
  price_cents int,                           -- marketplace
  created_at timestamptz default now()
);

-- Conversation sessions with a role
create table role_sessions (
  id uuid primary key default gen_random_uuid(),
  user_id uuid references auth.users not null,
  role_id uuid references roles not null,
  started_at timestamptz default now(),
  ended_at timestamptz,
  transcript text,
  memory_id uuid references memories         -- the convo itself becomes a memory
);

-- Family / shared spaces (V0.2+)
create table spaces (
  id uuid primary key default gen_random_uuid(),
  owner_id uuid references auth.users not null,
  name text not null,
  created_at timestamptz default now()
);

create table space_members (
  space_id uuid references spaces on delete cascade,
  user_id uuid references auth.users,
  role text default 'member',                -- "owner" | "member" | "caregiver"
  primary key (space_id, user_id)
);

create table memory_spaces (
  memory_id uuid references memories on delete cascade,
  space_id uuid references spaces on delete cascade,
  primary key (memory_id, space_id)
);

-- Pipeline queue (pgmq table; managed by extension)
-- See pgmq docs

-- RLS policies (selected)
alter table memories enable row level security;
create policy "user owns memories" on memories
  for all using (auth.uid() = user_id);

alter table people enable row level security;
create policy "user owns people" on people
  for all using (auth.uid() = user_id);

4. Pipeline (Queue Architecture)

                       ┌──────────────────────┐
ingest-audio  ───────► │ recordings (pending) │
(POST endpoint)        └──────────┬───────────┘
                                  │
                       enqueue: pipeline.transcribe
                                  ▼
                       ┌──────────────────────┐
                       │  transcribe-step     │  Groq / Deepgram
                       └──────────┬───────────┘
                                  │
                       enqueue: pipeline.identify-speakers (if multi-speaker)
                                  ▼
                       ┌──────────────────────┐
                       │ identify-speakers    │  Pyannote on Modal
                       └──────────┬───────────┘
                                  │
                       enqueue: pipeline.embed
                                  ▼
                       ┌──────────────────────┐
                       │ embed-step           │  Voyage-3-lite
                       └──────────┬───────────┘
                                  │
                       enqueue: pipeline.summarize
                                  ▼
                       ┌──────────────────────┐
                       │ summarize-step       │  Gemini Flash / Haiku
                       └──────────┬───────────┘
                                  │
                       enqueue: pipeline.compute-edges
                                  ▼
                       ┌──────────────────────┐
                       │ compute-edges-step   │  pgvector top-K
                       └──────────┬───────────┘
                                  │
                       update: recordings.status = 'done'
                       Realtime push → app

Each step is a separate Edge Function with a single responsibility.
Queue: pgmq (Postgres-native message queue) for V0.1; consider Inngest at V0.2 if observability hurts.
Failures retry with exponential backoff up to 5 times, then move to dead-letter table.
Idempotency: each step writes its result keyed on recording_id; safe to replay.

5. Agent Interface (Swappable Layer)

export interface AgentSession {
  start(input: { userId: string; roleId: string; modality: 'text' | 'voice' }): Promise<void>;
  send(message: string | AudioChunk): AsyncIterable<AgentEvent>;
  interject(): Promise<void>;            // V0.3 group mode
  assumeRole(roleId: string): Promise<void>;
  end(): Promise<SessionSummary>;
}

export type AgentEvent =
  | { type: 'partial'; text: string }
  | { type: 'final'; text: string; audio?: ArrayBuffer }
  | { type: 'tool_call'; name: string; args: unknown }
  | { type: 'error'; message: string };

// Drivers
//   packages/agents/drivers/stateless-rag.ts       (V0.1)
//   packages/agents/drivers/claude-agent-sdk.ts    (V0.2)
//   packages/agents/drivers/realtime-voice.ts      (V0.2 voice, V0.3 group)

The app code only ever imports AgentSession. Swapping drivers is a config change.

6. Hardware-Facing API Contracts

(Mirrors §6 of 00_MASTER_PLAN.md. Software is the server side of these contracts.)

6.1 `POST /functions/v1/ingest-audio`

// backend/functions/ingest-audio/index.ts (sketch)
serve(async (req) => {
  const auth = req.headers.get('Authorization');
  const deviceId = req.headers.get('X-Device-Id');
  const recordedAt = url.searchParams.get('recorded_at');
  const doaJson = url.searchParams.get('doa_json');

  // 1. Verify device JWT, look up device + user
  // 2. Stream body to Supabase Storage at audio/<user>/<recording_id>.wav
  // 3. Insert recordings row (status=pending)
  // 4. Enqueue pipeline.transcribe
  // 5. Return 202 { recording_id }
});

6.2 `POST /functions/v1/pair-device`

App-initiated. Generates a one-shot device_jwt valid for ingest-audio, returns it bundled into a QR payload.

6.3 `POST /functions/v1/revoke-device`

User-initiated from app. Marks device revoked, invalidates JWT, notifies device over BLE if connected.

6.4 Realtime channels

recordings:user_id=eq.<uid> — new rows + status updates
memories:user_id=eq.<uid> — when pipeline completes
devices:user_id=eq.<uid> — pairing + revocation events

7. Phase Deliverables — Software Track

Phase 0 — V0 (Wk 1–3)

Next.js 15 PWA deployed to Vercel
Supabase project provisioned, full schema migrated
Magic-link auth + consent screen
Phone-mic recording (getUserMedia with VAD via @ricky0123/vad-web)
Audio upload to Supabase Storage + recordings row
Synchronous Edge Function chain (no queue yet): transcribe → store memory
Today (list) + Memory detail views
Postgres FTS text search
PostHog + Sentry instrumented
Stripe customer pre-created on signup
Internal getMemories(query, filter, k) retrieval API
20 invited users
/ingest-audio endpoint accepts hardware uploads from day one — same endpoint, same recording row, just different device_id. Hardware track uses this to land its first end-to-end demo in the same phase.

Phase 1 — V0.1 (Wk 4–6)

Move pipeline to pgmq queue + step workers
Diarization (Deepgram Nova-3)
Pyannote.audio worker on Modal for speaker re-ID; populates people.voice_embedding
Voyage-3-lite embeddings + HNSW index
Universe view (react-force-graph in web)
First role: Reviewer (text-only, RAG driver)
Stripe Pro tier ($12/mo) goes live; PostHog feature flags gate features
Export to Markdown
App-side BLE pairing UX — required for the Phase 1 integration demo (“Pair dev-kit via QR scan → device joins WiFi → captures meeting”). On Chrome Android: Web Bluetooth in the PWA. On iOS: a developer-only CLI tool (10 dev-kits, internal use only); consumer iOS pairing UX ships with the Expo mobile app at V0.2.
App calls POST /functions/v1/pair-device and POST /functions/v1/revoke-device from settings UI

Phase 2 — V0.2 (Wk 7–10)

Expo mobile app — feature parity with web
pnpm workspace, shared packages/db, packages/shared, packages/agents
RevenueCat for mobile billing; Stripe for web
Voice talk-back: Pipecat + Deepgram streaming STT + Cartesia Sonic TTS
Claude Agent SDK driver replaces stateless RAG
Two more roles: Tutor, Brainstorm Partner
Family tier launches: spaces, multi-member, caregiver role
Offline recording on mobile (queue locally, upload when online)

Phase 3 — V0.3 (Wk 11–14)

LiveKit-based group mode (continuous streaming session, server-side)
Backend bridge: device → HTTPS chunked upload (1s Opus chunks) → bridge publishes as a LiveKit participant track. Device does not speak WebRTC directly — see 02_HARDWARE_PLAN.md §6.2. Web/mobile clients join the room directly via LiveKit SDKs.
Agent interjection capability via interject() — agent can speak into the LiveKit room as a participant
Polished pairing/revocation UX in the Expo mobile app (replaces the Phase 1 CLI dev tool for iOS; QR scan + BLE handshake)
Hardware status UI (battery, recording state, mute control)
MCP server built (internal use only) — agents and roles use it as the retrieval layer
Caregiving role with explicit per-person consent flow

Phase 4 — V1.0 (Wk 15–22)

DoA fusion: combine XVF3800 azimuth with Pyannote re-ID for higher-confidence speaker labels
Public MCP server — Pro users get an endpoint they plug into Claude Desktop / ChatGPT / Cursor
Role marketplace UI (browse, install, publish)
Stripe Connect for marketplace payouts (70/30 split)
B2B admin dashboard (multi-staff, audit log, org-level billing)
SOC 2 Type 1 prep started

Phase 5 — V1.1+ (Wk 23+)

Vertical packages: Caregiving, Education, Therapy
Memory-as-a-Service API tier (usage-billed)
Smartwatch app
Multi-language support (start with ES, FR, DE)
On-device summary cache for fast offline browsing

All audio encrypted at rest (Supabase Storage default + customer-managed keys for B2B)
Per-user “delete everything” flow (cascades to recordings, memories, embeddings, voice prints)
Default retention: indefinite for owner, but UI nudge to set retention windows
Voice embeddings of non-self people require their consent (people.consent_status)
Two-party consent default in legal mode (user can disable for personal use, with a confirmation)
Audit log table for B2B: who accessed what memory when
Device JWT short-lived (rotated every 30 days, refreshed via BLE when phone is paired)

9. Cost Discipline Rules

VAD before upload — never send silence to a paid API
Cheapest model that works — measure quality, not vendor prestige
Cache ruthlessly — embeddings, summaries, role responses (semantic cache for common queries)
Per-user usage tracking in user_profiles.monthly_seconds_used enforced at upload time
Hard cutoffs at tier limits — record locally, queue for upload, but reject if cap exceeded
Free tier Pyannote calls batched — daily, not real-time; Pro gets real-time

10. Open Questions

Should free tier get the graph view or only Pro? (Probably free, since it’s the wow moment.)
Self-host Pyannote vs. use Speechmatics managed? (Default: self-host on Modal; switch if ops pain exceeds savings.)
Web Bluetooth for hardware pairing on Android? (Nice-to-have at V1.0; iOS requires native app anyway.)
Should role marketplace allow free roles? (Yes — drives adoption; only premium ones are gated.)
Encrypted-at-client option for Pro+ users? (Probably V1.1; complicates retrieval but is a compelling sell for therapy / caregiving B2B.)