Skip to content

ARCIVE — Software Plan

Companion to 00_MASTER_PLAN.md and 02_HARDWARE_PLAN.md. This document is the source of truth for everything that runs on a server, in a browser, or on a phone.

Software and hardware are co-equal, parallel tracks. Software ships standalone (phone-mic) at every phase, but is built to integrate with the device from Phase 0. The hardware↔software contract in 00_MASTER_PLAN §6 is frozen at the start of each phase; both teams build to it.


1. Stack — Final Decisions

ConcernChoiceNotes
Web app frameworkNext.js 15 (App Router) on VercelRSC, edge runtime, ISR, free hobby tier
Mobile appExpo SDK 52+ with Expo RouteriOS + Android from one codebase, EAS for builds & OTA
BackendSupabasePostgres + Auth + Storage + Realtime + Edge Functions
Vector storepgvector with HNSWSame DB; no Pinecone/Weaviate needed
Transcription (V0, batch)Groq Whisper-large-v3-turbo~$0.04/hr, faster than realtime but batch-only (POST complete file → get complete transcript). Fits V0 chunked-upload model perfectly.
Transcription (V0.1+, streaming)Deepgram Nova-3True WebSocket streaming + live diarization, ~$0.26/hr. Used wherever live transcript or sub-second latency is required (voice talk-back, group mode, live captions).
Speaker re-IDPyannote.audio on ModalCross-session identity, owned voice embeddings
EmbeddingsVoyage-3-liteCheap, top retrieval benchmarks
Summary / topic extractionGemini 2.5 Flash or Claude Haiku 4.5Cents per recording
Agent frameworkClaude Agent SDKTool use, memory tool, role system prompts
Voice talk-back (V0.2+)Pipecat + Cartesia Sonic TTS + Deepgram streaming STTSub-second turn latency
Group conversation mediaLiveKit CloudWebRTC, used by OpenAI Realtime, free tier
Pipeline orchestrationpgmq + pg_cron (V0.1) → Inngest (V0.2+) if complexity demandsQueue-based, durable, retryable
AuthSupabase Auth + magic linksNo passwords
PaymentsStripe Billing + RevenueCat for mobileOne SDK across iOS/Android/web
AnalyticsPostHogProduct analytics + session replay + feature flags
ErrorsSentryWeb + mobile + Edge Functions
Search (text)Postgres FTS (tsvector)No Algolia needed
Search (semantic)pgvector HNSWSame query layer
File storageSupabase StorageAudio + exports
Code structurepnpm workspace monorepoapps/web, apps/mobile, packages/db, packages/shared, packages/agents
Type safetyTypeScript everywhere + Zod for runtimeGenerated types from Supabase schema
ORMDrizzle (when raw SQL gets painful)Type-safe, lightweight
TestingVitest + Playwright for web, Maestro for mobileSkip what’s not high-value
CI/CDGitHub Actions + Vercel previews + EAS UpdateOTA mobile fixes without store review

1.4. AI Architecture — Layered (canonical: ADR-0011)

ARCIVE composes specialized AI capabilities around a memory store. The product moat is the memory store + retrieval + auto-correlation, not any AI capability. The §1 “Stack” table above lists individual choices per concern; this section is how those choices fit together as a system, and which classes drive procurement.

Three layer classes:

  • ARCIVE-owned — built internally; never outsourced. The moat.
  • Strategic AI — best-in-class on a strategic axis (e.g. voice fidelity per ADR-0006, agent quality per ADR-0003). Vendor swaps require their own ADR.
  • Commodity AI — cheapest acceptable vendor with a coded fallback. Vendor swaps within this class do not require a new ADR. They happen as the market shifts.

The 11 layers:

#LayerClassToday’s vendor
1Capture (mic, transcode, ingest)ARCIVE-ownedrecorders + audio-transcode worker
2Transcribe (audio → text + timestamps)commodityGroq Whisper
3Understand (text → summary, topics, entities)commodityGemini Flash → Anthropic Haiku → Groq Llama (fallback chain)
4Embed (text/image → vector)commodityVoyage-3-lite
5Retrieve (query → ranked memories)ARCIVE-owned (MOAT)MCP server + pgvector HNSW + tsv + edges + consent gate
6Reason (chat with memories as context)semi-strategicClaude Agent SDK
7Hear (real-time STT for voice loop)commodityDeepgram Nova-3 (paused per ADR-0010)
8Speak (TTS)strategic (ADR-0006)Cartesia → ElevenLabs v3 → Sesame CSM (paused per ADR-0010)
9Voice orchestration (real-time turn management)strategic (ADR-0002, paused per ADR-0010)Pipecat (paused) — composed on resumption, never speech-to-speech
10Auto-correlate (cross-source memory linkage)ARCIVE-owned (MOAT)edges job; future per discussions/2026-05-04_multimodal_expansion.md
11Surfaces (PWA, mobile, MCP-as-output, future email-in / share-target)ARCIVE-ownedweb + mobile + MCP server

ARCIVE never consolidates to a single vendor across the stack. This matches how every credible adjacent player (Notion, Granola, Otter, Limitless, Apple Intelligence, ElevenLabs Conversational AI) actually operates. Full reasoning, cost projections at ARCIVE volumes, and the rejected alternatives are in ADR-0011 and the working session notes discussions/2026-05-05_ai_strategy_architecture.md.

Stage 6 (unified “talk + chat together”) is redefined: ships as two surfaces (Talk + Chat) sharing the memory store, not as a single-model unified architecture (which would force ARCIVE to abandon ADR-0006 voice fidelity). Same data, different input modalities.


1.5. Capture-Surface Capabilities (Honest Truth Table)

Different input surfaces have different capabilities. We do NOT promise capabilities a surface can’t deliver. This table is canonical — UX copy must reflect it.

CapabilityARCIVE DeviceMobile App (native, Expo)Web App / PWA
Always-on capture (records when not in foreground)✅ (with foreground service / background audio mode)❌ — browsers suspend tabs after lock/switch
Press-to-record (intentional dictation)
Recording while phone screen is off / locked✅ (Android foreground service; iOS background audio capability)
Recording survives WiFi outage✅ (30-min local buffer)✅ (SQLite/file queue)⚠️ (IndexedDB queue, lost if user closes tab)
Multi-speaker far-field capture✅ (4-mic array, 5m)❌ (single phone mic, ~1m)
DoA / speaker positioning metadata
Onboard VAD (silence not uploaded)✅ (XVF3800 hardware)✅ Silero VAD via onnxruntime-react-native or @picovoice/cobra-react-native (decision at V0.2 implementation time)✅ (@ricky0123/vad-web, AudioWorklet, Silero ONNX)
Opus encoding✅ (firmware)✅ (native codec)⚠️ Chrome ✅, Safari often falls back to MP4/AAC
Group mode (continuous WebRTC stream)✅ (Phase 3+)✅ (Phase 3+)✅ for joining group sessions, not as primary capturer
BLE pairing with ARCIVE devicen/a✅ (react-native-ble-plx)⚠️ Web Bluetooth: Chrome Android ✅, Safari iOS ❌
Voice talk-back (real-time)✅ via paired phone✅ (best when tab is foregrounded)
Push notificationsn/a⚠️ iOS 16.4+ only when installed to home screen
Install to home screenn/a✅ App Store / Play Store✅ PWA (see §1.6)
Offline browsing of past memoriesn/a✅ (SQLite cache)✅ (Service Worker + IndexedDB)

Implications

  • The web/PWA is for intentional dictation and review. UX copy says “Tap to record.” It does NOT say “ARCIVE listens to your day.” That promise belongs to the device and the native mobile app.
  • The mobile app is the always-on phone-only capture surface (when no device is paired). Foreground service on Android, background audio mode on iOS — both legitimate platform features, both require explicit permission disclosure.
  • The ARCIVE device is the unrestricted always-on capture surface. No OS in the way.
  • We never promise background recording on the web.

iOS background-mode footnote

Declaring UIBackgroundModes: ["audio", "bluetooth-central"] in Info.plist is necessary but not sufficient. Always-on capture + always-paired BLE on iOS requires:

  • An active audio session (category playAndRecord, mixWithOthers) at all times capture is expected — going idle risks suspension.
  • CBCentralManager state restoration implemented (CBCentralManagerOptionRestoreIdentifierKey + centralManager(_:willRestoreState:)) so iOS can re-wake the app and reconnect to the device after suspension.
  • bluetooth-central background mode declared in addition to audio mode — they are independent capabilities. Without state restoration, iOS will silently drop the BLE connection during suspension and not re-pair until the user reopens the app.

1.6. How the PWA Works

The “PWA” in “Next.js PWA” is not a separate framework — it’s a set of browser standards layered onto a normal website. Three pieces, simple individually, powerful together.

Piece 1 — manifest.webmanifest (the install metadata)

A JSON file at apps/web/public/manifest.webmanifest. Tells the browser “this site is installable, here’s its name, icon, color, start URL, and orientation.”

{
"name": "ARCIVE",
"short_name": "ARCIVE",
"description": "Record, retrieve, interact, and create memories.",
"start_url": "/today",
"display": "standalone",
"background_color": "#F8F6F1",
"theme_color": "#88B4A0",
"icons": [
{ "src": "/icons/192.png", "sizes": "192x192", "type": "image/png" },
{ "src": "/icons/512.png", "sizes": "512x512", "type": "image/png", "purpose": "maskable" }
]
}

Linked from <head>:

<link rel="manifest" href="/manifest.webmanifest" />
<meta name="theme-color" content="#88B4A0" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="default" />

When the user visits the site on Chrome Android or Safari iOS, the browser detects this manifest and offers an “Add to Home Screen” prompt (Android) or the user manually adds it from the Share menu (iOS). Once added, ARCIVE launches in standalone mode — no browser chrome, no URL bar, looks identical to a native app.

Piece 2 — Service Worker (the offline + caching brain)

A JavaScript file (apps/web/public/sw.js) the browser registers in the background. It intercepts network requests and can serve cached responses, queue failed requests for retry, handle push notifications, and run independently of any open tab.

For ARCIVE we use it for three things:

  1. Offline shell — cache the app’s HTML / JS / CSS so the UI loads instantly even with no network. Memories themselves come from API; if API fails, show cached list.
  2. Failed-upload retry queue — if the user records a memory and the upload fails (no WiFi), the service worker keeps the chunk in IndexedDB and retries when the network comes back, even if the tab is closed.
  3. Web push notifications (iOS 16.4+, all of Android) — server can push “Your memory is processed” without the app being open.

We don’t write the service worker by hand — we use @serwist/next (the maintained successor to next-pwa) which generates one from config. Setup is a single Next.js plugin.

Piece 3 — Web APIs the PWA uses

Once installed, the PWA can use the same browser APIs as a normal website — but UX is fullscreen, app-like:

  • getUserMedia for mic
  • MediaRecorder for capture
  • AudioWorklet for VAD
  • Web Bluetooth for device pairing (Android only)
  • IndexedDB for local cache
  • Web Push for notifications

What the user experiences

  1. Visits arcive.app in Chrome Android or Safari iOS.
  2. Browser shows “Add ARCIVE to Home Screen” (or user does it manually on iOS).
  3. Icon appears on home screen, indistinguishable from a native app.
  4. Tap → app launches fullscreen (no browser chrome).
  5. Works offline — past memories visible, new recordings queued for upload.
  6. Push notifications arrive (iOS 16.4+, all Android).
  7. Updates ship instantly — next time the app opens, it grabs the latest version. No App Store review.

What the PWA still cannot do (so we don’t pretend)

LimitationWorkaround
No background audio recording on iOSUse the native mobile app for that
No Web Bluetooth on Safari iOSPairing flow goes through native mobile app on iOS
Push notifications iOS-limitedAcceptable for V0; native app at V0.2 fixes. iOS Web Push prerequisites (all required): PWA installed to home screen via Safari, manifest has display: standalone or fullscreen, VAPID keys configured server-side, permission prompt fired from a user-gesture event handler (not on page load), not in private browsing.
Full filesystem accessNot needed for our use case
Speech recognition (SpeechRecognition) Safari-spottyWe do STT server-side anyway

Why PWA-first beats native-first for V0

  • Ship a URL in 3 weeks, not 6. No App Store review, no provisioning profiles, no $99/yr Apple Developer fee for V0.
  • One codebase across desktop, mobile web, installed-PWA. Native (Expo) comes at V0.2 once the pattern is proven.
  • Iteration speed. Push a fix, refresh the page, done. No EAS build queue, no TestFlight invites.
  • Marketing site is the same Next.js app. No second codebase to maintain.

The PWA is genuinely the right V0 surface. The native mobile app at V0.2 then unlocks always-on capture and full BLE — the one thing the PWA can’t do.

Implementation cost

Adding PWA support to a Next.js 15 app is roughly:

  • 1 hour: install @serwist/next, configure
  • 1 hour: write manifest.webmanifest, generate icons (192, 512, maskable)
  • 1 hour: add iOS-specific <meta> tags, test “Add to Home Screen” flow on real devices
  • 2 hours: write IndexedDB upload-queue logic for failed uploads
  • 1 hour: test offline shell loads when network is off

Half a day’s work for a real PWA. Worth it from V0.


1.7. Hosting Strategy

Three things to host. Each has a clear best home.

What needs hosting

ThingRecommended homeAlternatives
Next.js web appVercel for V0–V0.2; Cloudflare Pages for V0.2+ if cost mattersNetlify, self-hosted on VPS, Fly.io
Backend (functions, DB, auth, storage, realtime)Supabase (locked)
GPU / heavy workers (Pyannote, future on-device-grade STT)ModalReplicate, Fly Machines (GPU), self-host
MCP server (Phase 4+)Cloudflare Workers (cheap, edge, always-on)Vercel Edge Functions, Fly
Firmware OTA binariesSupabase StorageAny object storage

Hosted-Supabase-only development

We do not run a local Supabase stack. No Docker, no local Postgres container — every developer points at the same hosted project (or their own personal one) over HTTPS. Trade-off: every migration / Edge Function change is a supabase db push / functions deploy, not an in-process reload. Win: zero Docker dependency, zero LAN-IP gymnastics when testing on a real phone or device, identical environment between laptop and CI.

Terminal window
# One-time install
npm install -g supabase # CLI for `link`, `db push`, `functions deploy` — no Docker needed
brew install pnpm # or: npm install -g pnpm
# Bootstrap
git clone <repo>
pnpm install
# Link to a hosted Supabase project (yours or the team's)
supabase login
supabase link --project-ref <PROJECT_REF>
supabase db push
pnpm functions:deploy
# Fill apps/web/.env.local with the hosted project's URL + anon key
cp .env.example apps/web/.env.local
# Run
pnpm --filter web dev # Next.js on localhost:3030 (project standard — not Next default 3000)
# When testing the mobile app
pnpm --filter mobile start # phone hits the same hosted project; no tunnel, no LAN IP
# When testing the device (Phase 0+)
# Firmware uploads directly to https://<project-ref>.supabase.co/functions/v1/ingest-audio

Hosting decision matrix

ProviderBest forV0 costV0.2 costSetup timeVerdict
VercelNext.js$0 (hobby)$20–50 (Pro)5 minDefault. Use this for V0.
Cloudflare Pages + WorkersNext.js at scale$0$5–2030 minMigrate here at V0.2 if Vercel bill matters.
NetlifyNext.js$0$19+10 minFine, but no advantage over Vercel or Cloudflare
Hostinger / VPS Node.jsCheapest fixed cost~$4~$4–101–2 daysSkip unless you have a strong reason
Self-host on Hetzner / DigitalOceanFull control~$5~$101–2 daysSkip for V0; consider for local-only variant (V2.x)
Fly.ioContainer apps$0–5$5–201 hrBetter for backend than for Next.js
Railway / RenderSimple PaaS$5–10$10–3030 minOK middle ground; no real edge

Recommendation

  1. Local-first for the first 1–2 weeks of V0. No hosting yet.
  2. Deploy V0 to Vercel. Free hobby tier, 5-minute setup, PR previews, optimized for Next.js. Don’t reconsider until the bill is meaningful (>$50/mo).
  3. At V0.2, evaluate Cloudflare Pages. Same Next.js code, much cheaper at scale, generous Workers free tier (100k req/day). Migration is 1–2 days.
  4. Backend stays on Supabase regardless of where the web app lives. Don’t conflate.
  5. Long-running Pyannote workers go to Modal (pay-per-second GPU, no always-on cost).
  6. Skip cheap VPS hosts (Hostinger, DigitalOcean droplets, Hetzner) for V0. Saving $4/mo is not worth losing 2 days of engineering. Reconsider only when:
    • You’re at 100k+ users and Vercel bill > $500/mo
    • You have data-residency requirements (EU/UK/India local hosting)
    • You’re building the local-only ARCIVE variant (V2.x) — that genuinely needs self-hosted or device-as-server architecture

Why not Hostinger / cheap VPS for V0

  • No CDN — slow first paint for users far from the server
  • No edge runtime — every request hits one region
  • No image optimization — bigger pages, slower loads
  • No PR previews — slows iteration
  • No zero-config Next.js — you maintain Node, PM2, nginx, SSL, deploy scripts
  • ~1–2 days of setup vs. 5 minutes
  • Saves ~$15/mo, costs days of engineering you don’t have at V0

Engineering principle

At V0, optimize for engineering speed, not infrastructure cost. The bill is small at the scale where speed matters most. Cost optimization is a Phase 2+ exercise.


1.8. Latency & Streaming Architecture

This section is the source of truth for what streams where, what’s batch, and what the latency budget is at every realtime touchpoint. It corrects three things that were vague or wrong in earlier sections.

Streaming vs. batch — the two pipelines

ARCIVE has two parallel data pipelines, not one. They use different transcription providers because they have different latency requirements.

PIPELINE A — INGESTION (batch, V0+)
capture surface ──► chunk to 30s + Opus mono @ 24kbps ──► HTTPS POST /ingest-audio
Groq Whisper-large-v3-turbo (batch)
queue: diarize → embed → summarize → edges
Supabase Realtime → app shows new memory
Use: dictation, hardware capture, mobile background capture, group session post-processing
Total user-visible latency: ~2–5s after chunk seal
PIPELINE B — LIVE (streaming, V0.2+)
capture surface ──► WebSocket / WebRTC ──► Deepgram Nova-3 streaming
partial transcripts (~200ms)
Pipecat orchestrator
Claude Agent SDK (Haiku 4.5, streaming)
Cartesia Sonic TTS (sub-100ms first chunk)
audio back to user
Use: voice talk-back, role-play conversations, group mode interjections, live captions
Total user-visible latency budget: <1.5s end-of-utterance → AI starts speaking

Latency budget (Pipeline B, voice talk-back)

StageComponentTargetRealistic
End-of-utterance detectionPipecat + Deepgram VAD200ms200–400ms
STT final resultDeepgram Nova-3 streaming+100ms after EOU100–300ms
LLM first tokenClaude Haiku 4.5 streaming+300ms300–500ms
TTS first audio chunkCartesia Sonic+100ms90–150ms
Network RTTvariable50ms50–200ms
Total~750ms ideal~750ms–1.55s realistic

Phase 2 ships with explicit latency benchmarking against a 1500ms hard ceiling. If we exceed it, options: co-locate Deepgram + Cartesia in the same region as the user, switch to Cartesia’s edge endpoints, or use a smaller LLM (Haiku → 7B model on Groq).

Why Pipecat over OpenAI Realtime / Gemini Live

Both OpenAI Realtime API and Gemini Live offer bundled STT+LLM+TTS in a single WebSocket. Tempting, but rejected because:

  • Vendor lock: you cannot swap STT, LLM, or TTS independently. If Cartesia ships a better voice or Deepgram beats them on latency, you can’t take advantage.
  • Cost: ~$0.06/min for OpenAI Realtime vs. ~$0.02/min for Pipecat-orchestrated Deepgram + Haiku + Cartesia.
  • Tool use parity: Pipecat with Claude Agent SDK has more mature tool-calling for our memory retrieval tools.
  • Group mode: LiveKit Agents (which Pipecat composes with cleanly) is the proven pattern for multi-party WebRTC. OpenAI Realtime has weaker multi-party support.

We keep OpenAI Realtime as a fallback driver in the swappable agent layer — packages/agents/drivers/openai-realtime.ts exists as an option but is not the default.

Realtime / streaming coverage matrix

CapabilityWeb (V0)Mobile (V0.2)Device (V0)Device (V0.3 group)
Continuous mic capture (foreground)✅ getUserMedia✅ expo-audio✅ XVF3800 → I2S
Continuous mic capture (background)❌ tab suspends✅ background-audio mode✅ always✅ always
On-device VAD✅ Silero in AudioWorklet✅ Silero ONNX or Cobra✅ XVF3800 hardware✅ XVF3800 hardware
Chunked HTTPS upload✅ fetch + ReadableStream✅ fetch✅ esp_https_client
Resumable upload on network drop✅ IndexedDB queue✅ SQLite queue✅ 30-min circular buffer
Streaming STT (live transcript)✅ Deepgram WS from V0.1✅ from V0.2n/a (server-side after upload)✅ via backend bridge
Voice talk-back loop (Pipecat)✅ from V0.2✅ from V0.2✅ via paired phone playback
Group mode WebRTC✅ LiveKit JS SDK from V0.3✅ LiveKit RN SDK from V0.3⚠️ HTTPS chunked → backend bridge⚠️ same

Validated stack capabilities

ComponentStreams?Verified
Supabase Edge Functions (Deno)Request.body is ReadableStreamUsed in production by audio products
Supabase Storage✅ resumable uploads via tusNative support
Supabase Realtime✅ WebSocket, sub-100ms pushNative
Groq Whisper-large-v3-turbo❌ batch onlyConfirmed by Groq API docs
Deepgram Nova-3✅ WebSocket streaming + live diarizationConfirmed
Claude Agent SDK✅ streaming responses + tool useNative
Cartesia Sonic TTS✅ <100ms first-byteConfirmed by published benchmarks
Pipecat✅ provider-agnostic STT→LLM→TTS pipelineActive OSS, used by many
LiveKit Cloud + Agents✅ multi-party WebRTC + server agent participantsUsed by OpenAI Realtime, Character.ai
pgvector HNSW✅ sub-50ms similarity at our scaleStandard
@ricky0123/vad-web✅ Silero VAD in AudioWorkletStandard
MCP server✅ JSON-RPC over stdio/HTTP, supports streaming responsesConfirmed

2. Repository Layout (software-only)

arcive/
├── apps/
│ ├── web/ # Next.js 15 PWA
│ │ ├── app/
│ │ │ ├── (auth)/login
│ │ │ ├── (app)/today
│ │ │ ├── (app)/universe
│ │ │ ├── (app)/memory/[id]
│ │ │ ├── (app)/people
│ │ │ ├── (app)/roles
│ │ │ └── (app)/settings
│ │ ├── components/
│ │ ├── lib/
│ │ └── public/manifest.webmanifest
│ │
│ └── mobile/ # Expo (V0.2+)
│ ├── app/(tabs)/today.tsx
│ ├── app/(tabs)/universe.tsx
│ ├── app/memory/[id].tsx
│ ├── app/pair-device.tsx # V0.3+
│ └── lib/ble.ts # V0.3+
├── packages/
│ ├── db/
│ │ ├── migrations/ # Supabase migrations
│ │ ├── schema.ts # Drizzle schema
│ │ └── types.ts # Generated
│ │
│ ├── shared/
│ │ ├── zod/ # Validation schemas (Memory, Person, Role)
│ │ ├── ble-uuids.ts # ← shared with firmware (HW Plan §6)
│ │ ├── agent-interface.ts # AgentSession contract
│ │ └── api-contracts.ts # Edge Function I/O types
│ │
│ └── agents/
│ ├── roles/ # Built-in role definitions
│ │ ├── reviewer.ts
│ │ ├── tutor.ts
│ │ ├── caregiver.ts
│ │ └── brainstorm.ts
│ ├── tools/
│ │ ├── memory-search.ts
│ │ ├── person-lookup.ts
│ │ └── timeline-window.ts
│ └── drivers/
│ ├── stateless-rag.ts # V0.1
│ ├── claude-agent-sdk.ts # V0.2
│ └── realtime-voice.ts # V0.2 (voice) / V0.3 (group)
├── backend/
│ ├── functions/ # Supabase Edge Functions
│ │ ├── ingest-audio/ # Hardware + app upload entrypoint
│ │ ├── transcribe-step/ # Queue worker
│ │ ├── embed-step/
│ │ ├── identify-speakers-step/
│ │ ├── summarize-step/
│ │ ├── compute-edges-step/
│ │ ├── pair-device/ # V0.3+
│ │ └── revoke-device/
│ │
│ ├── workers/ # Long-running workers (Modal/Fly)
│ │ └── pyannote-reid/ # Speaker re-ID
│ │
│ └── mcp/ # MCP server (V0.3+ internal, V1 public)
│ └── arcive-memory-mcp/
└── shared/
└── ble-uuids.ts # Mirrored to firmware repo

3. Database Schema (V0 — full)

-- Vector extension
create extension if not exists vector;
-- Subscription enum
create type subscription_tier as enum ('free', 'pro', 'family', 'enterprise');
-- Users handled by Supabase Auth; supplemental table for app-level fields
create table user_profiles (
id uuid primary key references auth.users,
display_name text,
subscription_tier subscription_tier default 'free',
stripe_customer_id text unique,
monthly_seconds_used int default 0,
monthly_seconds_reset_at timestamptz,
consent_granted_at timestamptz,
created_at timestamptz default now()
);
-- People in the user's life (including "self")
create table people (
id uuid primary key default gen_random_uuid(),
user_id uuid references auth.users not null,
display_name text not null, -- "Self", "Mom", "Dr. Singh"
voice_embedding vector(192), -- Pyannote embedding (V0.1+)
relationship text, -- "self" | "family" | "friend" | "professional"
notes text, -- user-editable context for the agent
consent_status text default 'pending', -- "granted" | "pending" | "revoked"
created_at timestamptz default now()
);
-- Devices (hardware + phones; designed for full variant lineup per Master Plan §2.5)
create type device_kind as enum (
'phone_ios', 'phone_android', 'web', 'watch_apple', 'watch_wearos',
'arcive_clip', 'arcive_pendant', 'arcive_tabletop',
'arcive_card', 'arcive_screen', 'arcive_cellular', 'arcive_local'
);
create table devices (
id uuid primary key default gen_random_uuid(),
user_id uuid references auth.users not null,
kind device_kind not null,
name text,
mac_address text unique, -- null for phones/web
firmware_version text,
capabilities jsonb, -- {has_mic_array, has_screen, has_cellular, has_local_compute, mic_count, ...}
connectivity text[], -- ["wifi", "ble"] | ["cellular", "ble"] | ["local_only"]
cellular_iccid text, -- variant-specific, null otherwise
local_mode boolean default false, -- if true, device does on-device STT/embed and uploads only summaries
paired_at timestamptz,
revoked_at timestamptz,
created_at timestamptz default now()
);
-- Raw recordings
create table recordings (
id uuid primary key default gen_random_uuid(),
device_id uuid references devices,
user_id uuid references auth.users not null,
storage_path text not null,
duration_seconds int,
recorded_at timestamptz not null,
doa_metadata jsonb, -- HW only
status text default 'pending', -- pending | processing | done | error
error_message text,
created_at timestamptz default now()
);
-- Processed memories
create table memories (
id uuid primary key default gen_random_uuid(),
recording_id uuid references recordings unique,
user_id uuid references auth.users not null,
transcript text,
transcript_tsv tsvector generated always as (to_tsvector('english', coalesce(transcript, ''))) stored,
summary text,
topics text[],
embedding vector(512), -- Voyage-3-lite is 512-dim
recorded_at timestamptz,
created_at timestamptz default now()
);
create index memories_tsv_idx on memories using gin(transcript_tsv);
create index memories_embedding_idx on memories using hnsw (embedding vector_cosine_ops);
-- Speaker segments per memory
create table memory_participants (
id uuid primary key default gen_random_uuid(),
memory_id uuid references memories on delete cascade,
person_id uuid references people, -- nullable until re-ID resolves
speaker_label text, -- "Speaker A", "Speaker B" from diarization
speaking_time_seconds int,
segments jsonb -- [{start_s, end_s, text}]
);
-- Semantic edges between memories (V0.1+)
create table memory_edges (
id uuid primary key default gen_random_uuid(),
memory_a uuid references memories on delete cascade,
memory_b uuid references memories on delete cascade,
similarity float not null,
created_at timestamptz default now(),
unique(memory_a, memory_b)
);
-- AI Roles (built-in + user-created + marketplace)
create table roles (
id uuid primary key default gen_random_uuid(),
user_id uuid references auth.users, -- null for built-in
name text not null,
description text,
system_prompt text not null,
voice_id text, -- Cartesia/ElevenLabs voice
retrieval_config jsonb, -- {window_days, person_filter, top_k, ...}
guardrails jsonb, -- {avoid_topics, escalation_triggers}
is_premium boolean default false,
is_published boolean default false, -- marketplace
price_cents int, -- marketplace
created_at timestamptz default now()
);
-- Conversation sessions with a role
create table role_sessions (
id uuid primary key default gen_random_uuid(),
user_id uuid references auth.users not null,
role_id uuid references roles not null,
started_at timestamptz default now(),
ended_at timestamptz,
transcript text,
memory_id uuid references memories -- the convo itself becomes a memory
);
-- Family / shared spaces (V0.2+)
create table spaces (
id uuid primary key default gen_random_uuid(),
owner_id uuid references auth.users not null,
name text not null,
created_at timestamptz default now()
);
create table space_members (
space_id uuid references spaces on delete cascade,
user_id uuid references auth.users,
role text default 'member', -- "owner" | "member" | "caregiver"
primary key (space_id, user_id)
);
create table memory_spaces (
memory_id uuid references memories on delete cascade,
space_id uuid references spaces on delete cascade,
primary key (memory_id, space_id)
);
-- Pipeline queue (pgmq table; managed by extension)
-- See pgmq docs
-- RLS policies (selected)
alter table memories enable row level security;
create policy "user owns memories" on memories
for all using (auth.uid() = user_id);
alter table people enable row level security;
create policy "user owns people" on people
for all using (auth.uid() = user_id);

4. Pipeline (Queue Architecture)

┌──────────────────────┐
ingest-audio ───────► │ recordings (pending) │
(POST endpoint) └──────────┬───────────┘
enqueue: pipeline.transcribe
┌──────────────────────┐
│ transcribe-step │ Groq / Deepgram
└──────────┬───────────┘
enqueue: pipeline.identify-speakers (if multi-speaker)
┌──────────────────────┐
│ identify-speakers │ Pyannote on Modal
└──────────┬───────────┘
enqueue: pipeline.embed
┌──────────────────────┐
│ embed-step │ Voyage-3-lite
└──────────┬───────────┘
enqueue: pipeline.summarize
┌──────────────────────┐
│ summarize-step │ Gemini Flash / Haiku
└──────────┬───────────┘
enqueue: pipeline.compute-edges
┌──────────────────────┐
│ compute-edges-step │ pgvector top-K
└──────────┬───────────┘
update: recordings.status = 'done'
Realtime push → app
  • Each step is a separate Edge Function with a single responsibility.
  • Queue: pgmq (Postgres-native message queue) for V0.1; consider Inngest at V0.2 if observability hurts.
  • Failures retry with exponential backoff up to 5 times, then move to dead-letter table.
  • Idempotency: each step writes its result keyed on recording_id; safe to replay.

5. Agent Interface (Swappable Layer)

packages/shared/agent-interface.ts
export interface AgentSession {
start(input: { userId: string; roleId: string; modality: 'text' | 'voice' }): Promise<void>;
send(message: string | AudioChunk): AsyncIterable<AgentEvent>;
interject(): Promise<void>; // V0.3 group mode
assumeRole(roleId: string): Promise<void>;
end(): Promise<SessionSummary>;
}
export type AgentEvent =
| { type: 'partial'; text: string }
| { type: 'final'; text: string; audio?: ArrayBuffer }
| { type: 'tool_call'; name: string; args: unknown }
| { type: 'error'; message: string };
// Drivers
// packages/agents/drivers/stateless-rag.ts (V0.1)
// packages/agents/drivers/claude-agent-sdk.ts (V0.2)
// packages/agents/drivers/realtime-voice.ts (V0.2 voice, V0.3 group)

The app code only ever imports AgentSession. Swapping drivers is a config change.


6. Hardware-Facing API Contracts

(Mirrors §6 of 00_MASTER_PLAN.md. Software is the server side of these contracts.)

6.1 POST /functions/v1/ingest-audio

// backend/functions/ingest-audio/index.ts (sketch)
serve(async (req) => {
const auth = req.headers.get('Authorization');
const deviceId = req.headers.get('X-Device-Id');
const recordedAt = url.searchParams.get('recorded_at');
const doaJson = url.searchParams.get('doa_json');
// 1. Verify device JWT, look up device + user
// 2. Stream body to Supabase Storage at audio/<user>/<recording_id>.wav
// 3. Insert recordings row (status=pending)
// 4. Enqueue pipeline.transcribe
// 5. Return 202 { recording_id }
});

6.2 POST /functions/v1/pair-device

App-initiated. Generates a one-shot device_jwt valid for ingest-audio, returns it bundled into a QR payload.

6.3 POST /functions/v1/revoke-device

User-initiated from app. Marks device revoked, invalidates JWT, notifies device over BLE if connected.

6.4 Realtime channels

  • recordings:user_id=eq.<uid> — new rows + status updates
  • memories:user_id=eq.<uid> — when pipeline completes
  • devices:user_id=eq.<uid> — pairing + revocation events

7. Phase Deliverables — Software Track

Phase 0 — V0 (Wk 1–3)

  • Next.js 15 PWA deployed to Vercel
  • Supabase project provisioned, full schema migrated
  • Magic-link auth + consent screen
  • Phone-mic recording (getUserMedia with VAD via @ricky0123/vad-web)
  • Audio upload to Supabase Storage + recordings row
  • Synchronous Edge Function chain (no queue yet): transcribe → store memory
  • Today (list) + Memory detail views
  • Postgres FTS text search
  • PostHog + Sentry instrumented
  • Stripe customer pre-created on signup
  • Internal getMemories(query, filter, k) retrieval API
  • 20 invited users
  • /ingest-audio endpoint accepts hardware uploads from day one — same endpoint, same recording row, just different device_id. Hardware track uses this to land its first end-to-end demo in the same phase.

Phase 1 — V0.1 (Wk 4–6)

  • Move pipeline to pgmq queue + step workers
  • Diarization (Deepgram Nova-3)
  • Pyannote.audio worker on Modal for speaker re-ID; populates people.voice_embedding
  • Voyage-3-lite embeddings + HNSW index
  • Universe view (react-force-graph in web)
  • First role: Reviewer (text-only, RAG driver)
  • Stripe Pro tier ($12/mo) goes live; PostHog feature flags gate features
  • Export to Markdown
  • App-side BLE pairing UX — required for the Phase 1 integration demo (“Pair dev-kit via QR scan → device joins WiFi → captures meeting”). On Chrome Android: Web Bluetooth in the PWA. On iOS: a developer-only CLI tool (10 dev-kits, internal use only); consumer iOS pairing UX ships with the Expo mobile app at V0.2.
  • App calls POST /functions/v1/pair-device and POST /functions/v1/revoke-device from settings UI

Phase 2 — V0.2 (Wk 7–10)

  • Expo mobile app — feature parity with web
  • pnpm workspace, shared packages/db, packages/shared, packages/agents
  • RevenueCat for mobile billing; Stripe for web
  • Voice talk-back: Pipecat + Deepgram streaming STT + Cartesia Sonic TTS
  • Claude Agent SDK driver replaces stateless RAG
  • Two more roles: Tutor, Brainstorm Partner
  • Family tier launches: spaces, multi-member, caregiver role
  • Offline recording on mobile (queue locally, upload when online)

Phase 3 — V0.3 (Wk 11–14)

  • LiveKit-based group mode (continuous streaming session, server-side)
  • Backend bridge: device → HTTPS chunked upload (1s Opus chunks) → bridge publishes as a LiveKit participant track. Device does not speak WebRTC directly — see 02_HARDWARE_PLAN.md §6.2. Web/mobile clients join the room directly via LiveKit SDKs.
  • Agent interjection capability via interject() — agent can speak into the LiveKit room as a participant
  • Polished pairing/revocation UX in the Expo mobile app (replaces the Phase 1 CLI dev tool for iOS; QR scan + BLE handshake)
  • Hardware status UI (battery, recording state, mute control)
  • MCP server built (internal use only) — agents and roles use it as the retrieval layer
  • Caregiving role with explicit per-person consent flow

Phase 4 — V1.0 (Wk 15–22)

  • DoA fusion: combine XVF3800 azimuth with Pyannote re-ID for higher-confidence speaker labels
  • Public MCP server — Pro users get an endpoint they plug into Claude Desktop / ChatGPT / Cursor
  • Role marketplace UI (browse, install, publish)
  • Stripe Connect for marketplace payouts (70/30 split)
  • B2B admin dashboard (multi-staff, audit log, org-level billing)
  • SOC 2 Type 1 prep started

Phase 5 — V1.1+ (Wk 23+)

  • Vertical packages: Caregiving, Education, Therapy
  • Memory-as-a-Service API tier (usage-billed)
  • Smartwatch app
  • Multi-language support (start with ES, FR, DE)
  • On-device summary cache for fast offline browsing

  • All audio encrypted at rest (Supabase Storage default + customer-managed keys for B2B)
  • Per-user “delete everything” flow (cascades to recordings, memories, embeddings, voice prints)
  • Default retention: indefinite for owner, but UI nudge to set retention windows
  • Voice embeddings of non-self people require their consent (people.consent_status)
  • Two-party consent default in legal mode (user can disable for personal use, with a confirmation)
  • Audit log table for B2B: who accessed what memory when
  • Device JWT short-lived (rotated every 30 days, refreshed via BLE when phone is paired)

9. Cost Discipline Rules

  1. VAD before upload — never send silence to a paid API
  2. Cheapest model that works — measure quality, not vendor prestige
  3. Cache ruthlessly — embeddings, summaries, role responses (semantic cache for common queries)
  4. Per-user usage tracking in user_profiles.monthly_seconds_used enforced at upload time
  5. Hard cutoffs at tier limits — record locally, queue for upload, but reject if cap exceeded
  6. Free tier Pyannote calls batched — daily, not real-time; Pro gets real-time

10. Open Questions

  • Should free tier get the graph view or only Pro? (Probably free, since it’s the wow moment.)
  • Self-host Pyannote vs. use Speechmatics managed? (Default: self-host on Modal; switch if ops pain exceeds savings.)
  • Web Bluetooth for hardware pairing on Android? (Nice-to-have at V1.0; iOS requires native app anyway.)
  • Should role marketplace allow free roles? (Yes — drives adoption; only premium ones are gated.)
  • Encrypted-at-client option for Pro+ users? (Probably V1.1; complicates retrieval but is a compelling sell for therapy / caregiving B2B.)