ARCIVE — Master Plan

The single source of truth for the ARCIVE product, across software and hardware. Read this first. Then read 01_SOFTWARE_PLAN.md and 02_HARDWARE_PLAN.md.

1. Product Thesis

ARCIVE is a hardware & software platform that empowers and enriches the lives of its users by lowering the barrier to record, retrieve, interact, and create memories and thoughts — past, present & future — for individuals and/or friends and family.

This is the canonical one-sentence definition. Everything else in this plan derives from it.

It is a platform, not a single product — one software stack, multiple hardware form factors over time (clip, pendant, tabletop, watch, card, screen-equipped, cellular, local-only).
The four verbs — record, retrieve, interact, create — define every feature decision. If a feature doesn’t serve one of these, it doesn’t ship.
Past, present & future: capture (past), companion-in-the-moment (present), and reflective/predictive companion (future) are all in scope. V0 covers past + early present; future scope expands into present + future.
Individuals and/or friends and family: solo use cases AND shared spaces are first-class. Schema, pricing, and UX accommodate both from V0.
Hardware and software are co-equal tracks, developed in parallel from week 1, integrated continuously via the contract in §6.
The app works without any device so adoption is never blocked. The device(s) are how the product is meant to be experienced — frictionless capture, no screen pulling you in.
Monetization is device + subscription + platform/marketplace + vertical B2B.

User mental model: “I have ARCIVE. It listens, it remembers, it understands me, it’s there when I need it — past, present, and future.”

2. Strategic Principles

#	Principle	Why
1	Hardware and software are equal, parallel tracks	The device is core product identity, not a peripheral. Both teams ship every phase. The integration contract (§6) is the rail they run on.
2	App works standalone; device is the intended experience	Phone-mic input is the safety net for adoption, but the device is the differentiated capture surface (always-on, no screen, distraction-free).
3	Ship V0 in 3 weeks — software AND a working hardware bring-up	Both tracks must demo end-to-end at every phase boundary. No “hardware later.”
4	One integration contract, two implementations	The HW↔SW contract (§6) is frozen at start of each phase. Both sides build to it. Changes require a joint sync.
5	Schema & architecture designed for V1 from V0	Pivot from dictaphone → companion → platform must not require a rewrite. Bones in place from V0.
6	Cost-per-user must be sub-$2/mo at free tier	Otherwise growth = bankruptcy. VAD on-device + cheap STT (Groq) + cheap embeddings (Voyage) are non-negotiable.
7	Privacy & consent are V0 features, not later additions	Always-on mics in shared spaces are a legal minefield. Hardware-level mute, non-overridable LED, consent screen all in V0.
8	Agent layer is swappable from day one	Stateless RAG → Agent SDK → voice-native realtime. The app code shouldn’t care.
9	MCP-first memory layer	Memory store doubles as B2B/platform offering. Build it as a service from the start.
10	Distraction-free is a hardware constraint AND a software constraint	Device has no screen by default. App has no infinite feeds, notifications, or engagement loops. Calm by design at every layer.

2.5. Hardware Variant Lineup (Platform View)

ARCIVE is a platform. The first device defines the architecture; future variants extend it. All variants share the same software stack, the same /ingest-audio contract, the same BLE GATT schema (where applicable), and the same OTA system. Form factor and connectivity differ.

Variant	When	Form	Use case	Key constraints
Clip / Pendant (V1.0)	Phase 4	Wearable on lanyard or clip	Always-with-you personal capture	Battery 8–12 hr, BLE+WiFi, no screen
Tabletop puck (V1.0 SKU or V1.5)	Phase 4–5	Desk or conference-table puck	Group conversation, meetings, family dinners, study rooms	Plugged-in option, larger battery, optimized for far-field
Pendant variant (V1.5)	Phase 5	Necklace / pendant form	Always-on capture for users who don’t want a clip	Industrial-design refresh, smaller battery acceptable
Watch app companion (V1.1+)	Phase 5+	Apple Watch / Wear OS app	Intentional dictation when device isn’t worn; quick capture	Software only, no new hardware; uses watch mic
Card format (V2.x)	Future	Credit-card-sized device worn in pocket / wallet	Discreet capture, executives / professionals	Slim battery, single MEMS mic likely (sacrifices array for form), BLE-tethered to phone for upload
Screen-equipped variant (V2.x)	Future	Tabletop or pendant with e-ink or small OLED	Status, summaries, role selection without phone	Display drives slightly higher BOM; firmware adds UI layer
Cellular variant (V2.x)	Future	Any of the above with onboard LTE-M / NB-IoT	Capture without a phone or WiFi nearby; caregiving / kids / safety use cases	LTE module + SIM, eSIM activation flow, higher BOM, recurring connectivity cost passed through to user
Local-only variant (V2.x)	Future	Any of the above with on-device summarization	Privacy-maximalist users, regulated industries (legal, healthcare), air-gapped settings	Larger MCU or co-processor (e.g., ESP32-P4 or RK3566), local STT (Whisper-tiny), local embeddings, syncs only to user’s own device or NAS

Why this list now matters

The platform contract must accommodate variants from day one — device.kind enum, capability flags, optional cellular metadata, optional local-mode flag in the schema.
Cellular and local-only changes the data flow. Cellular adds latency + cost pressure (chunk smaller, upload smarter). Local-only inverts the cloud assumption (memory store can live on user device). Both must be design considerations even if not built until V2.
Card / pendant / watch shift the input surface. Single-mic and watch-mic are degraded inputs; software diarization & speaker re-ID must gracefully fall back.
Screen variant changes the firmware UI layer, but does not change the cloud product.
These variants are NOT a roadmap commitment — they’re the shape of the platform we’re designing for. We commit to V1.0 (clip/pendant + tabletop) and keep the door open to the rest.

2.7. Architecture (current state — V0.3 in flight)

Color legend (both diagrams):

🟦 Blue — capture / write path (client → ingest → storage)
🟪 Purple — pipeline processing (internal step-to-step)
🟧 Orange — external API call (Modal worker or AI vendor)
🟩 Green — read / Realtime push / query path
🟥 Red — failure path (dead-letter queue)
⬜ Gray dotted — planned / not yet wired

2.7.a — High-level (5-box overview)

For a 30-second read of how data flows.

graph LR C["Clients Web PWA Mobile HW Device (V0.3+)"] B["Supabase Backend Auth, Postgres, Storage, pgmq queue, Realtime"] P["Async Pipeline 7 steps, per-step Edge Function"] AI["AI Vendor Chain Gemini Flash to Anthropic Haiku to Groq Llama 3.3 70B"] A["Agent Layer /api/chat + arcive-memory MCP"] C -- "capture audio" --> B B -- "enqueue jobs" --> P P -- "call models" --> AI P -- "write memories + topics" --> B B -- "Realtime push" --> C C -- "ask questions" --> A A -- "retrieve" --> B A -- "generate" --> AI linkStyle 0 stroke:#3b82f6,stroke-width:2px linkStyle 1 stroke:#8b5cf6,stroke-width:2px linkStyle 2 stroke:#f59e0b,stroke-width:2px linkStyle 3 stroke:#8b5cf6,stroke-width:2px linkStyle 4 stroke:#10b981,stroke-width:2px linkStyle 5 stroke:#10b981,stroke-width:2px linkStyle 6 stroke:#10b981,stroke-width:2px linkStyle 7 stroke:#f59e0b,stroke-width:2px

2.7.b — Detailed (numbered flow)

Numbered subgraphs follow the data path 1 → 6.

graph TB subgraph s1["1. Clients"] W["Web PWA Next.js 15"] M["Mobile Expo SDK 53"] H["HW Device ESP32-S3 + XVF3800 V0.3+"] end subgraph s2["2. Ingest"] AUTH["Auth magic-link via Resend"] ING["Edge Function POST /ingest-audio"] end subgraph s3["3. Storage"] S[("Audio bucket signed URLs")] DB[("Postgres + pgvector memories, topics, edges, spaces, roles, people")] end subgraph s4["4. Async Pipeline (pgmq, per-step Edge Function)"] direction LR P1["transcribe"] --> P2["diarize + re-ID"] P2 --> P3["summarize + topics"] P3 --> P4["embed"] P4 --> P5["edges + topic links"] DLQ[("pipeline_dead_letters 30d TTL")] end subgraph s5["5. External compute"] MOD1["Modal: Whisper"] MOD2["Modal: Pyannote diarize + re-ID"] MOD3["Modal: Audio transcode webm/Opus to m4a"] AI["AI Vendor Chain Gemini to Anthropic to Groq"] end subgraph s6["6. Agent + Realtime"] RT["Supabase Realtime postgres_changes"] CHAT["/api/chat Claude Agent SDK + consent gate (ADR-0007)"] MCP["arcive-memory MCP separate Node process"] end W --> AUTH M --> AUTH H --> AUTH W --> ING M --> ING H --> ING ING --> S ING --> P1 P1 --> MOD1 P2 --> MOD2 P3 --> AI P5 --> DB P1 -. "on max-retry" .-> DLQ S -. "webhook" .-> MOD3 MOD3 --> S DB --> RT RT --> W RT --> M W --> CHAT M -. "γ.2 planned" .-> CHAT CHAT --> MCP MCP --> DB CHAT --> AI %% Edge index ordering (Mermaid counts in source order): %% 0-3: pipeline internal arrows (P1->P2->P3->P4->P5) -> purple %% 4-6: client->Auth (W,M,H) -> blue %% 7-9: client->Ingest (W,M,H) -> blue %% 10: ING->S -> blue %% 11: ING->P1 -> purple %% 12-14: P1->MOD1, P2->MOD2, P3->AI -> orange %% 15: P5->DB -> purple %% 16: P1->DLQ (dotted) -> red %% 17-18: S->MOD3, MOD3->S -> orange %% 19: DB->RT -> green %% 20-21: RT->W, RT->M -> green %% 22: W->CHAT -> green %% 23: M->CHAT (dotted, planned) -> gray %% 24-25: CHAT->MCP, MCP->DB -> green %% 26: CHAT->AI -> orange linkStyle 0,1,2,3,11,15 stroke:#8b5cf6,stroke-width:2px linkStyle 4,5,6,7,8,9,10 stroke:#3b82f6,stroke-width:2px linkStyle 12,13,14,17,18,26 stroke:#f59e0b,stroke-width:2px linkStyle 19,20,21,22,24,25 stroke:#10b981,stroke-width:2px linkStyle 16 stroke:#ef4444,stroke-width:2px linkStyle 23 stroke:#9ca3af,stroke-width:1.5px

Not in either diagram (planned / paused):

Pipecat voice service — scaffolded then paused per ADR-0010. Resumes when EAS unblocks mobile native deps AND speech-to-speech-vs-Pipecat decision lands.

LiveKit group conversation mode — V0.3 deferred with voice talk-back.

Public MCP server (Cloudflare Workers) — V1.0.

Stripe + RevenueCat billing — V0.1 scaffolded; tiers live by V1.0.

3. Phased Roadmap — Parallel Tracks

Both tracks ship at every phase boundary. Each phase ends with an integrated demo: software working end-to-end with the current hardware build.

gantt title ARCIVE — phased roadmap (parallel SW + HW) dateFormat YYYY-MM-DD axisFormat Wk %V section Software V0 PWA dictaphone (phone mic) :sw0, 2026-01-05, 21d V0.1 Diarization + re-ID + Reviewer + Pro :sw1, after sw0, 21d V0.2 Voice talk-back + Family + Mobile (Expo) :sw2, after sw1, 28d V0.3 Group conversation + Caregiver :sw3, after sw2, 28d V1.0 Marketplace + Public MCP + B2B pilot :sw4, after sw3, 56d V1.1+ Vertical packages + MaaS :sw5, after sw4, 56d section Hardware V0 XVF3800 dev-kit (raw I2S + WiFi upload) :hw0, 2026-01-05, 21d V0.1 Firmware: VAD, BLE pairing, LED, OTA :hw1, after hw0, 21d V0.2 Enclosure proto #1, 10 units, AEC :hw2, after hw1, 28d V0.3 50-unit pilot, DoA fusion, BLE control :hw3, after hw2, 28d V1.0 Enclosure v1 + 200-unit commercial :hw4, after hw3, 56d V1.1+ Industrial refresh + cert path :hw5, after hw4, 56d

Phase	Weeks	Version	Software	Hardware
0	1–3	V0	Web PWA dictaphone (phone mic)	XVF3800 dev-kit bring-up; raw I2S → WiFi upload to same backend
1	4–6	V0.1	Diarization + re-ID + Reviewer role + Pro tier	Full firmware on dev-kit — VAD, BLE pairing, status LED, OTA scaffold
2	7–10	V0.2	Voice talk-back¹ + Family + mobile (Expo)	Enclosure prototype #1 (3D-printed), 10 internal units, AEC validated for talk-back
3	11–14	V0.3	Group conversation mode + Caregiver role	50-unit pilot batch, DoA fusion live in pipeline, BLE control complete
4	15–22	V1.0	Marketplace + public MCP + B2B pilot	Enclosure v1 production-intent, 200-unit commercial batch, OTA fleet mgmt
5	23+	V1.1+	Vertical packages + memory-as-a-service	Industrial-design refresh, optional e-ink screen, certification path

¹ Voice talk-back scaffolded then paused 2026-05-05 — see ADR-0010.

Both tracks integrate continuously. Software contract endpoints exist from Phase 0 so firmware can target them; firmware sends real audio from Phase 0 even if the enclosure is a breadboard.

4. Version Matrix — Capabilities by Release

Software capabilities

Capability	V0	V0.1	V0.2	V0.3	V1.0	V1.1+
Web app (PWA)	✅	✅	✅	✅	✅	✅
Native mobile (iOS+Android)	—	—	✅	✅	✅	✅
Magic-link auth	✅	✅	✅	✅	✅	✅
Consent screen	✅	✅	✅	✅	✅	✅
Capture from phone mic	✅	✅	✅	✅	✅	✅
Capture from ARCIVE device	✅ dev-kit	✅ dev-kit	✅ proto	✅ pilot	✅ retail	✅
List view + memory detail	✅	✅	✅	✅	✅	✅
Text search	✅	✅	✅	✅	✅	✅
Semantic search (embeddings)	—	✅	✅	✅	✅	✅
Graph / Universe view	—	✅	✅	✅	✅	✅
Speaker diarization	—	✅	✅	✅	✅	✅
Cross-session speaker re-ID	—	✅	✅	✅	✅	✅
AI roles (text)	—	✅ Reviewer	✅ +Tutor	✅ +Caregiver	✅ marketplace	✅ vertical
AI talk-back (voice)	—	—	✅	✅	✅	✅
Group conversation mode	—	—	—	✅	✅	✅
MCP server	—	—	—	internal	✅ public	✅
Stripe / RevenueCat billing	scaffolded	✅ Pro	✅ Family	✅	✅ marketplace	✅ enterprise
Free / Pro / Family tiers	scaffolded	Free+Pro	+Family	✅	✅	✅
Marketplace (custom roles)	—	—	—	—	✅	✅
B2B vertical packages	—	—	—	—	pilot	✅
Export (Markdown / Obsidian)	—	—	✅	✅	✅	✅
Family / shared spaces	—	—	✅	✅	✅	✅

Hardware capabilities (parallel track)

Capability	V0	V0.1	V0.2	V0.3	V1.0	V1.1+
Build form	dev-kit (XVF3800+XIAO breadboard)	dev-kit	enclosure proto #1 (3D-print)	pilot enclosure (3D-print refined)	production-intent enclosure (SLA/CNC)	injection-molded
Units in field	5 internal	10 internal	10 + 5 design partners	50 pilot	200 commercial	1k+
I2S audio capture (XVF3800 → ESP32-S3)	✅	✅	✅	✅	✅	✅
WiFi upload to backend	✅	✅	✅	✅	✅	✅
On-device VAD (silence not uploaded)	basic	✅ tuned	✅	✅	✅	✅
BLE provisioning (WiFi creds + JWT)	—	✅	✅	✅	✅	✅
Status LED (breathing/solid/error)	basic	✅	✅	✅	✅	✅
Hardware mute (gates I2S clock)	—	wired	✅	✅	✅	✅
Battery + USB-C charge	dev-kit	dev-kit	✅ proto	✅ pilot	✅ retail	✅
8–12 hr battery life	n/a	n/a	tested	✅	✅	✅
DoA azimuth metadata per chunk	✅ raw	✅	✅	✅ fused with re-ID	✅	✅
AEC validated for talk-back	—	—	✅	✅	✅	✅
OTA firmware updates	—	scaffold	✅	✅ signed	✅ A/B partition	✅ fleet mgmt
Memfault / crash telemetry	—	—	✅	✅	✅	✅
Local circular buffer (offline)	—	basic	✅ 30-min	✅	✅	✅
Optional e-ink/OLED screen	—	—	—	—	—	✅ optional SKU
FCC/CE certification	—	—	—	—	(skipped — research units)	✅ retail

5. Deliverables by Phase — Both Tracks

Each phase ends with a joint integration demo: software + hardware working end-to-end together.

Phase 0 — V0 (Week 1–3)

Software

Next.js 15 PWA deployed to Vercel
Magic-link auth + consent screen
ingest-audio Edge Function live and accepting uploads from phone-mic AND device dev-kit
Phone-mic recording (getUserMedia + @ricky0123/vad-web)
Synchronous transcribe → store memory via Groq Whisper
Today (list) + Memory detail views
Postgres FTS text search
Full schema deployed (people / roles / role_sessions / memory_participants / subscription_tier)
Stripe customer pre-created on signup
PostHog + Sentry instrumented
20 invited users

Hardware

XVF3800 + XIAO ESP32-S3 dev-kit assembled (5 units, breadboard / Seeed reference board)
I2S firmware flashed on XVF3800 (per Seeed wiki)
ESP32-S3 firmware: I2S capture → 30s WAV chunk → WiFi → POST /ingest-audio (hardcoded WiFi creds + dev JWT for now)
DoA azimuth queried via I2C, attached as metadata to each chunk
LED breathing animation on capture
Forks Seeed’s HTTP audio streaming sample for fastest path to working

Integration demo: dev-kit on a desk records a meeting, transcripts appear in the web app feed in real time alongside phone-mic recordings.

Phase 1 — V0.1 (Week 4–6)

Software

Pipeline moved to pgmq queue + step workers
Diarization (Deepgram Nova-3)
Pyannote.audio worker on Modal for speaker re-ID
Voyage-3-lite embeddings + pgvector HNSW
Semantic search live
Universe / graph view (react-force-graph web)
First AI role: Reviewer (text-only)
Pro tier ($12/mo) launches
Markdown export

Hardware

Same dev-kit, full firmware feature set
VAD gating: silence never uploaded (uses XVF3800 VAD signal via I2C)
BLE GATT server implemented (provisioning + control + status characteristics per §6.3)
Pairing flow: app QR → BLE write of WiFi creds + device JWT → device reboots and joins WiFi
LED states: idle / recording / muted / uploading / error
OTA scaffolding (esp_https_ota wired up, manifest poll daily)
10 dev-kit units in internal use

Integration demo: paired device captures a multi-speaker meeting; diarization labels appear; same speaker recognized across two separate sessions.

Phase 2 — V0.2 (Week 7–10)

Software

Expo mobile app (iOS + Android) with feature parity
pnpm workspace, packages/db, packages/shared, packages/agents
Voice talk-back loop: Pipecat + Deepgram streaming STT + Cartesia Sonic TTS
Claude Agent SDK driver replaces stateless RAG
Roles: Tutor, Brainstorm Partner
Family tier ($25/mo) — spaces, multi-member, caregiver role
Offline recording on mobile + queue-and-forward
App Store + Play Store submissions

Hardware

Enclosure prototype #1: 3D-printed shell housing XVF3800 + XIAO + LiPo + USB-C charge IC + LED + mute button
10 internal units + 5 design-partner units
AEC validated for talk-back use case (XVF3800 onboard AEC; verified that device speaker echo doesn’t pollute uploaded audio when external speaker is involved — even though device has no speaker, AEC matters for talk-back via paired phone)
Hardware mute button wired to physically gate I2S clock (security-critical)
8-hour battery test passes
Local 30-min circular buffer for offline resilience
OTA channel dev live; firmware version reported via BLE characteristic
Memfault (or equivalent) crash reporting

Integration demo: user wears device on lanyard for a full work day; offline periods buffer locally; device firmware updates over OTA without user intervention; talk-back works via paired phone with the device as input.

Phase 3 — V0.3 (Week 11–14)

Software

LiveKit-based group conversation mode (continuous WebRTC, server-side multi-party room)
Backend bridge: device → HTTPS chunked upload (1s Opus chunks) → bridge publishes as a LiveKit participant track. Device → backend speaks HTTP; backend → LiveKit speaks WebRTC. Web/mobile clients join the room directly via LiveKit SDKs.
Agent interject() capability — agent can speak into group conversation as a defined role (audio TTS sent into the LiveKit room as a participant)
Caregiver role with per-person consent flow
DoA metadata fused with diarization in pipeline → higher-confidence speaker labels in multi-person rooms
Internal MCP server stood up (used by agents/roles as retrieval layer)

Hardware

50-unit pilot batch of refined enclosure (still 3D-printed, but iterated form factor based on Phase 2 feedback)
Industrial-design partner engaged for V1.0 enclosure
Continuous-streaming firmware mode (for group conversation): instead of 30s chunks, opens a sustained WebRTC/HTTPS stream
Power profile validated for sustained streaming (target ≥4 hr in this mode)
BLE notify channels for real-time mute/battery push
Pairing flow polished (sub-30s end-to-end)
50 units shipped to design partners + early adopters
Telemetry dashboard live: per-device upload volume, battery health, crash rate, daily active devices

Integration demo: device placed on conference table during a 4-person meeting; group mode active; speakers identified by name from the second session onward; AI role (Caregiver / Reviewer) interjects appropriately when invoked.

Phase 4 — V1.0 (Week 15–22)

Software

Public MCP server — Pro users plug ARCIVE memory into Claude Desktop / ChatGPT / Cursor
Role marketplace UI (browse / install / publish)
Stripe Connect for marketplace payouts (70/30)
B2B admin dashboard (multi-staff, audit log, org-level billing)
First B2B pilot signed (likely caregiving)
SOC 2 Type 1 prep started

Hardware

Production-intent enclosure (SLA or CNC; injection-molded slated for v1.5)
200-unit commercial batch manufactured
OTA with A/B partitions + signed firmware + automatic rollback on failed boot
Fleet management telemetry (firmware version distribution, error rate by version, retire-and-replace flow)
Hardware mute validation re-tested: cannot be defeated in software (security audit)
LED VAD-driven indicator validated (cannot be turned off while mic is hot)
Retail packaging design begins
App Store listing live with hardware as upsell (“Get the ARCIVE device for $129”)

Integration demo: end-to-end retail experience — user buys device, scans QR in app, pairs in 30s, captures and reviews memories via marketplace role, plugs into Claude Desktop via MCP for cross-tool agent access.

Phase 5 — V1.1+ (Week 23+)

Software

Memory-as-a-Service API tier (usage-billed)
ARCIVE for Caregiving (per-resident license, ~$50–100/mo)
ARCIVE for Education (per-student/school license)
ARCIVE for Therapy (per-therapist, ~$50–100/mo)
Smartwatch companion (Apple Watch / Wear OS) for intentional dictation when device isn’t worn
Multi-language support (start ES / FR / DE)

Hardware

Industrial-design refresh based on V1.0 field feedback
Optional e-ink or OLED screen SKU (the original V0.3 “screen” idea, deferred to here once we know what users want to see)
Replaceable / longer-runtime battery option
Injection-molded enclosure
FCC / CE certification (required for retail at scale)
Retail distribution partnerships (DTC, Amazon, possibly Best Buy)
Variants: clip / pendant / tabletop puck — based on V1.0 user preference data

6. Hardware ↔ Software Contract

This is the integration surface. Both teams must respect it. See full detail in plan files.

6.1 Audio Upload (HW → Backend)

POST https://<project>.supabase.co/functions/v1/ingest-audio
Headers:
  Authorization: Bearer <device_token>          # JWT signed at pairing time
  X-Device-Id: <uuid>
  Content-Type: audio/wav (or audio/opus)
Body: raw audio chunk (≤ 30s, VAD-trimmed)
Query: ?recorded_at=<iso8601>&doa_json=<urlencoded>
Response: 202 { recording_id }

Chunks ≤ 30 seconds (Edge Function timeout safety)
Audio format: Opus mono @ 24 kbps (server billing assumes a 16 kbps floor; ~2 KB/s)
VAD on-device — silence is never uploaded
DoA metadata as compact JSON: [{t_ms: 0, az: 87}, {t_ms: 1200, az: 92}, ...]
Retry with exponential backoff if offline; local circular buffer ≥ 30 minutes
Raw audio is retained in private Storage (audio/{user_id}/{recording_id}.{ext}) and is replayable from the memory detail page via short-lived signed URL

6.2 Pairing & Provisioning (App ↔ HW)

App generates a pairing QR code containing: { pairing_url, pairing_token, supabase_url }
User scans QR in app’s pair-device flow
HW receives WiFi creds + Supabase device JWT over BLE GATT (one-time, write-only characteristic)
HW writes its mac_address back to app over BLE notify
App calls POST /devices to register, links to user account
BLE characteristic UUIDs live in shared/ble-uuids.ts — single source of truth, imported by both firmware and app

6.3 BLE GATT Schema (V0.3+)

Service	Characteristic	Direction	Purpose
`ARCIVE_PROV`	`wifi_creds`	App → HW (write)	SSID + password JSON
`ARCIVE_PROV`	`device_jwt`	App → HW (write)	Supabase upload token
`ARCIVE_PROV`	`mac`	HW → App (read+notify)	Device MAC
`ARCIVE_CTRL`	`mute`	App ↔ HW (read+write+notify)	0=record, 1=muted
`ARCIVE_STATUS`	`battery`	HW → App (read+notify)	0–100
`ARCIVE_STATUS`	`recording_state`	HW → App (notify)	idle/recording/uploading/error
`ARCIVE_STATUS`	`firmware_version`	HW → App (read)	Semver string

UUIDs to be generated once and committed to shared/ble-uuids.ts.

6.4 Realtime sync (Backend → App)

App subscribes to Supabase Realtime channel for recordings table filtered to current user
HW upload arrives → row inserted → app sees it appear in real time
Pipeline status updates (pending → processing → done) push the same way
App never polls

HW LED must be visibly on (breathing) whenever mic is unmuted — non-overridable from firmware
Mute button is hardware-level: cuts I2S clock to mic array, not software-bypassable
App displays a “currently recording” indicator if any paired device is unmuted
First-run consent screen must be acknowledged before any recording is ingested
Device token can be revoked from app → HW receives revoke over BLE → wipes local creds

7. Cost Model

V0 (50 users, ~1 hr/day each, phone mic + 5 dev-kit HW units)

Line	Estimate
Supabase	$0 (free tier)
Vercel	$0 (free tier)
Groq Whisper	~$60/mo (50 × 30 hr × $0.04)
Voyage embeddings	~$2
Gemini Flash summaries	~$5
Sentry + PostHog	$0 (free tiers)
Recurring total	~$70/mo
HW capex (one-time)	~$270 (5× Seeed XVF3800 + XIAO dev-kits @ $54.50)

V0.1 (Wk 4–6, 100 users + 10 HW dev-kits)

Line	Estimate
Supabase	$0–25
Vercel	$0
AI services (Groq + Deepgram + Voyage + Gemini)	~$150
Pyannote on Modal	~$30
Recurring total	~$200/mo
HW capex (one-time, marginal)	~$270 (5 more dev-kits)

V0.2 (500 users, mix of free + Pro, 15 HW prototype units)

Line	Estimate
Supabase Pro	$25
Vercel Pro	$20
Transcription	~$300
Embeddings + summaries	~$50
Voice (Cartesia + Deepgram streaming) for Pro users	~$200
Pyannote (self-hosted on Modal)	~$80
Sentry/PostHog	$0–50
Recurring total	~$700/mo
HW capex (one-time)	~$1,500 (5 more dev-kits + 3D-print materials + LiPo batteries + USB-C + enclosure iteration)
Revenue (assume 10% Pro at $12)	$600/mo
Net (recurring)	roughly break-even

V1.0 (5,000 users + 200 HW units in field)

Line	Estimate
Infra (Supabase, Vercel, LiveKit)	~$500
AI services	~$3,000
Hardware COGS amortized	~$1,500/mo (assuming $30 unit cost, 200 sold over 2 mo)
Total	~$5,000/mo
Revenue (assume 12% Pro, 3% Family, hardware margin)	~$15,000/mo
Gross margin	~67%

8. Monetization Tiers

Tier	Price	Target	Key features
Free	$0	Trial / casual	5 hr/mo, 1 default companion role, text-only
Pro	$12/mo	Power individual	50 hr/mo, all built-in roles, voice talk-back (5 hr), graph view, MCP access, exports
Family	$25/mo	Households / caregiving	5 members, shared spaces, caregiver role, group mode, multi-device
Marketplace	rev-share	Creators + buyers	Custom roles published by creators, 70/30 split
Caregiving B2B	$50–100/resident/mo	Assisted living	Compliance, audit, multi-staff access
Education B2B	per-student/school	Schools / tutors	Study companion, syllabus-aware retrieval
Therapy B2B	$50–100/therapist/mo	Practitioners	Session capture (with consent), patient role-play for practice
API / Memory-as-a-Service	usage-based	AI builders	MCP endpoint, vector + entity APIs

9. Risk Register

Risk	Likelihood	Impact	Mitigation
Always-on recording legal exposure	High	Existential	Consent screen V0, hardware LED non-overridable, device-side mute (gates I2S clock at hardware level), 2-party consent default
AI vendor price hikes	Medium	High	Multiple STT providers integrated (Groq + Deepgram), swappable agent layer (Pipecat + Claude Agent SDK + fallback OpenAI Realtime driver)
Hardware delays block roadmap	Medium	Medium	Both tracks decoupled at the integration contract (§6); HW running on Seeed reference board through Phase 3 means delays are firmware/enclosure scope, not platform-level. App always works without device.
Speaker re-ID quality poor	Medium	High	Self-host Pyannote on Modal, allow user manual labeling, build feedback loop into UX
Competitors (Limitless, Plaud, Friend)	High	Medium	Wedge = AI roles + group mode + platform/MCP + variants, not capture alone
Free-tier abuse	Medium	Medium	5-hour cap, device-bound tokens, abuse rate-limits in Edge Functions
Privacy breach / data leak	Low	Existential	Encryption at rest, short retention defaults, per-user delete-all flow, SOC 2 Type 1 path planned for V1.0
ESP32-S3 device-side WebRTC at edge of chip capability	Medium	Low	V0.3 ships HTTPS chunked upload + backend LiveKit bridge (proven path). `esp-webrtc` evaluated as Phase 2 research spike; if not viable, bridge remains permanent — group-mode quality unaffected, only data path changes.
Voice talk-back latency exceeds 1.5s budget	Medium	High	Phase 2 ships explicit benchmarking (01_SOFTWARE_PLAN.md §1.8). Mitigations: co-locate Deepgram + Cartesia regions with user; switch to Cartesia edge endpoints; downgrade LLM to 7B-on-Groq for faster first-token.
iOS App Store rejection of ambient mic capture	Medium	High	Explicit App Review note: user-initiated recording, persistent visual indicator (LED on HW, banner in app), consent screen on first launch; provide test account with HW in dev mode
Variant feature creep (cellular, screen, local) before V1.0 lands	Medium	Medium	V1.0 is clip + tabletop only; variants are architected for, not committed. Schema accommodates them without migration.

10. Repository Layout

arcive/
├── docs/                        # ← you are here
│   ├── 00_MASTER_PLAN.md
│   ├── 01_SOFTWARE_PLAN.md
│   └── 02_HARDWARE_PLAN.md
│
├── apps/
│   ├── web/                     # Next.js 15 PWA (V0 → onwards)
│   └── mobile/                  # Expo / React Native (V0.2 → onwards)
│
├── packages/
│   ├── db/                      # Supabase schema, migrations, generated types
│   ├── shared/                  # Zod schemas, agent interface, BLE UUIDs
│   └── agents/                  # Role definitions, system prompts, tools
│
├── backend/
│   ├── functions/               # Supabase Edge Functions
│   ├── workers/                 # Queue workers (pgmq) for pipeline steps
│   └── mcp/                     # MCP server (V0.3+)
│
├── firmware/                    # ESP32-S3 firmware (V0.3+)
│   ├── src/
│   ├── platformio.ini
│   └── tests/
│
└── shared/
    └── ble-uuids.ts             # Single source of truth for HW + App

11. Decision Log

Date	Decision	Reason
2026-05-03	Build software-first, defer custom HW to Phase 4	Hardware kills startups; software validates demand cheaply
2026-05-03	Web PWA before native mobile	3-week ship, no app review, faster iteration
2026-05-03	Groq Whisper over AssemblyAI for V0	~10x cheaper, no diarization needed yet
2026-05-03	HNSW over ivfflat in pgvector	Better recall, scales further, default in 0.5+
2026-05-03	Supabase as full backend	Single vendor, single dashboard, generous free tier
2026-05-03	Schema includes people/roles/sessions from V0	Avoids migration pain when companion features land
2026-05-03	MCP-first internal retrieval API	Doubles as B2B/platform offering later
2026-05-03	HW v0 = white-label, HW v1 = custom XVF3800	De-risk hardware before custom PCB investment
2026-05-03	LiveKit for group mode media layer	Open source, free tier, OpenAI Realtime uses it
2026-05-03	Stripe + RevenueCat for billing	RevenueCat handles iOS/Android/web in one SDK

ARCIVE — Master Plan

1. Product Thesis

2. Strategic Principles

2.5. Hardware Variant Lineup (Platform View)

Why this list now matters

2.7. Architecture (current state — V0.3 in flight)

2.7.a — High-level (5-box overview)

2.7.b — Detailed (numbered flow)

3. Phased Roadmap — Parallel Tracks

4. Version Matrix — Capabilities by Release

Software capabilities

Hardware capabilities (parallel track)

5. Deliverables by Phase — Both Tracks

Phase 0 — V0 (Week 1–3)

Phase 1 — V0.1 (Week 4–6)

Phase 2 — V0.2 (Week 7–10)

Phase 3 — V0.3 (Week 11–14)

Phase 4 — V1.0 (Week 15–22)

Phase 5 — V1.1+ (Week 23+)

6. Hardware ↔ Software Contract

6.1 Audio Upload (HW → Backend)

6.2 Pairing & Provisioning (App ↔ HW)

6.3 BLE GATT Schema (V0.3+)

6.4 Realtime sync (Backend → App)

7. Cost Model

V0 (50 users, ~1 hr/day each, phone mic + 5 dev-kit HW units)

V0.1 (Wk 4–6, 100 users + 10 HW dev-kits)

V0.2 (500 users, mix of free + Pro, 15 HW prototype units)

V1.0 (5,000 users + 200 HW units in field)

8. Monetization Tiers

9. Risk Register

10. Repository Layout

11. Decision Log

Plans

Operations

Decisions (ADRs)

Discussions

ARCIVE — Master Plan

1. Product Thesis

2. Strategic Principles

2.5. Hardware Variant Lineup (Platform View)

Why this list now matters

2.7. Architecture (current state — V0.3 in flight)

2.7.a — High-level (5-box overview)

2.7.b — Detailed (numbered flow)

3. Phased Roadmap — Parallel Tracks

4. Version Matrix — Capabilities by Release

Software capabilities

Hardware capabilities (parallel track)

5. Deliverables by Phase — Both Tracks

Phase 0 — V0 (Week 1–3)

Phase 1 — V0.1 (Week 4–6)

Phase 2 — V0.2 (Week 7–10)

Phase 3 — V0.3 (Week 11–14)

Phase 4 — V1.0 (Week 15–22)

Phase 5 — V1.1+ (Week 23+)

6. Hardware ↔ Software Contract

6.1 Audio Upload (HW → Backend)

6.2 Pairing & Provisioning (App ↔ HW)

6.3 BLE GATT Schema (V0.3+)

6.4 Realtime sync (Backend → App)

6.5 Privacy & Consent Contract

7. Cost Model

V0 (50 users, ~1 hr/day each, phone mic + 5 dev-kit HW units)

V0.1 (Wk 4–6, 100 users + 10 HW dev-kits)

V0.2 (500 users, mix of free + Pro, 15 HW prototype units)

V1.0 (5,000 users + 200 HW units in field)

8. Monetization Tiers

9. Risk Register

10. Repository Layout

11. Decision Log

Plans

Operations

Decisions (ADRs)

Discussions