Skip to content

ARCIVE — Master Plan

The single source of truth for the ARCIVE product, across software and hardware. Read this first. Then read 01_SOFTWARE_PLAN.md and 02_HARDWARE_PLAN.md.


1. Product Thesis

ARCIVE is a hardware & software platform that empowers and enriches the lives of its users by lowering the barrier to record, retrieve, interact, and create memories and thoughts — past, present & future — for individuals and/or friends and family.

This is the canonical one-sentence definition. Everything else in this plan derives from it.

  • It is a platform, not a single product — one software stack, multiple hardware form factors over time (clip, pendant, tabletop, watch, card, screen-equipped, cellular, local-only).
  • The four verbsrecord, retrieve, interact, create — define every feature decision. If a feature doesn’t serve one of these, it doesn’t ship.
  • Past, present & future: capture (past), companion-in-the-moment (present), and reflective/predictive companion (future) are all in scope. V0 covers past + early present; future scope expands into present + future.
  • Individuals and/or friends and family: solo use cases AND shared spaces are first-class. Schema, pricing, and UX accommodate both from V0.
  • Hardware and software are co-equal tracks, developed in parallel from week 1, integrated continuously via the contract in §6.
  • The app works without any device so adoption is never blocked. The device(s) are how the product is meant to be experienced — frictionless capture, no screen pulling you in.
  • Monetization is device + subscription + platform/marketplace + vertical B2B.

User mental model: “I have ARCIVE. It listens, it remembers, it understands me, it’s there when I need it — past, present, and future.”


2. Strategic Principles

#PrincipleWhy
1Hardware and software are equal, parallel tracksThe device is core product identity, not a peripheral. Both teams ship every phase. The integration contract (§6) is the rail they run on.
2App works standalone; device is the intended experiencePhone-mic input is the safety net for adoption, but the device is the differentiated capture surface (always-on, no screen, distraction-free).
3Ship V0 in 3 weeks — software AND a working hardware bring-upBoth tracks must demo end-to-end at every phase boundary. No “hardware later.”
4One integration contract, two implementationsThe HW↔SW contract (§6) is frozen at start of each phase. Both sides build to it. Changes require a joint sync.
5Schema & architecture designed for V1 from V0Pivot from dictaphone → companion → platform must not require a rewrite. Bones in place from V0.
6Cost-per-user must be sub-$2/mo at free tierOtherwise growth = bankruptcy. VAD on-device + cheap STT (Groq) + cheap embeddings (Voyage) are non-negotiable.
7Privacy & consent are V0 features, not later additionsAlways-on mics in shared spaces are a legal minefield. Hardware-level mute, non-overridable LED, consent screen all in V0.
8Agent layer is swappable from day oneStateless RAG → Agent SDK → voice-native realtime. The app code shouldn’t care.
9MCP-first memory layerMemory store doubles as B2B/platform offering. Build it as a service from the start.
10Distraction-free is a hardware constraint AND a software constraintDevice has no screen by default. App has no infinite feeds, notifications, or engagement loops. Calm by design at every layer.

2.5. Hardware Variant Lineup (Platform View)

ARCIVE is a platform. The first device defines the architecture; future variants extend it. All variants share the same software stack, the same /ingest-audio contract, the same BLE GATT schema (where applicable), and the same OTA system. Form factor and connectivity differ.

VariantWhenFormUse caseKey constraints
Clip / Pendant (V1.0)Phase 4Wearable on lanyard or clipAlways-with-you personal captureBattery 8–12 hr, BLE+WiFi, no screen
Tabletop puck (V1.0 SKU or V1.5)Phase 4–5Desk or conference-table puckGroup conversation, meetings, family dinners, study roomsPlugged-in option, larger battery, optimized for far-field
Pendant variant (V1.5)Phase 5Necklace / pendant formAlways-on capture for users who don’t want a clipIndustrial-design refresh, smaller battery acceptable
Watch app companion (V1.1+)Phase 5+Apple Watch / Wear OS appIntentional dictation when device isn’t worn; quick captureSoftware only, no new hardware; uses watch mic
Card format (V2.x)FutureCredit-card-sized device worn in pocket / walletDiscreet capture, executives / professionalsSlim battery, single MEMS mic likely (sacrifices array for form), BLE-tethered to phone for upload
Screen-equipped variant (V2.x)FutureTabletop or pendant with e-ink or small OLEDStatus, summaries, role selection without phoneDisplay drives slightly higher BOM; firmware adds UI layer
Cellular variant (V2.x)FutureAny of the above with onboard LTE-M / NB-IoTCapture without a phone or WiFi nearby; caregiving / kids / safety use casesLTE module + SIM, eSIM activation flow, higher BOM, recurring connectivity cost passed through to user
Local-only variant (V2.x)FutureAny of the above with on-device summarizationPrivacy-maximalist users, regulated industries (legal, healthcare), air-gapped settingsLarger MCU or co-processor (e.g., ESP32-P4 or RK3566), local STT (Whisper-tiny), local embeddings, syncs only to user’s own device or NAS

Why this list now matters

  • The platform contract must accommodate variants from day onedevice.kind enum, capability flags, optional cellular metadata, optional local-mode flag in the schema.
  • Cellular and local-only changes the data flow. Cellular adds latency + cost pressure (chunk smaller, upload smarter). Local-only inverts the cloud assumption (memory store can live on user device). Both must be design considerations even if not built until V2.
  • Card / pendant / watch shift the input surface. Single-mic and watch-mic are degraded inputs; software diarization & speaker re-ID must gracefully fall back.
  • Screen variant changes the firmware UI layer, but does not change the cloud product.
  • These variants are NOT a roadmap commitment — they’re the shape of the platform we’re designing for. We commit to V1.0 (clip/pendant + tabletop) and keep the door open to the rest.

2.7. Architecture (current state — V0.3 in flight)

Color legend (both diagrams):

  • 🟦 Blue — capture / write path (client → ingest → storage)
  • 🟪 Purple — pipeline processing (internal step-to-step)
  • 🟧 Orange — external API call (Modal worker or AI vendor)
  • 🟩 Green — read / Realtime push / query path
  • 🟥 Red — failure path (dead-letter queue)
  • Gray dotted — planned / not yet wired

2.7.a — High-level (5-box overview)

For a 30-second read of how data flows.

graph LR C["Clients<br>Web PWA<br>Mobile<br>HW Device (V0.3+)"] B["Supabase Backend<br>Auth, Postgres, Storage,<br>pgmq queue, Realtime"] P["Async Pipeline<br>7 steps, per-step<br>Edge Function"] AI["AI Vendor Chain<br>Gemini Flash<br>to Anthropic Haiku<br>to Groq Llama 3.3 70B"] A["Agent Layer<br>/api/chat<br>+ arcive-memory MCP"] C -- "capture audio" --> B B -- "enqueue jobs" --> P P -- "call models" --> AI P -- "write memories + topics" --> B B -- "Realtime push" --> C C -- "ask questions" --> A A -- "retrieve" --> B A -- "generate" --> AI linkStyle 0 stroke:#3b82f6,stroke-width:2px linkStyle 1 stroke:#8b5cf6,stroke-width:2px linkStyle 2 stroke:#f59e0b,stroke-width:2px linkStyle 3 stroke:#8b5cf6,stroke-width:2px linkStyle 4 stroke:#10b981,stroke-width:2px linkStyle 5 stroke:#10b981,stroke-width:2px linkStyle 6 stroke:#10b981,stroke-width:2px linkStyle 7 stroke:#f59e0b,stroke-width:2px

2.7.b — Detailed (numbered flow)

Numbered subgraphs follow the data path 1 → 6.

graph TB subgraph s1["1. Clients"] W["Web PWA<br>Next.js 15"] M["Mobile<br>Expo SDK 53"] H["HW Device<br>ESP32-S3 + XVF3800<br>V0.3+"] end subgraph s2["2. Ingest"] AUTH["Auth<br>magic-link via Resend"] ING["Edge Function<br>POST /ingest-audio"] end subgraph s3["3. Storage"] S[("Audio bucket<br>signed URLs")] DB[("Postgres + pgvector<br>memories, topics, edges,<br>spaces, roles, people")] end subgraph s4["4. Async Pipeline (pgmq, per-step Edge Function)"] direction LR P1["transcribe"] --> P2["diarize + re-ID"] P2 --> P3["summarize + topics"] P3 --> P4["embed"] P4 --> P5["edges + topic links"] DLQ[("pipeline_dead_letters<br>30d TTL")] end subgraph s5["5. External compute"] MOD1["Modal: Whisper"] MOD2["Modal: Pyannote<br>diarize + re-ID"] MOD3["Modal: Audio transcode<br>webm/Opus to m4a"] AI["AI Vendor Chain<br>Gemini to Anthropic to Groq"] end subgraph s6["6. Agent + Realtime"] RT["Supabase Realtime<br>postgres_changes"] CHAT["/api/chat<br>Claude Agent SDK<br>+ consent gate (ADR-0007)"] MCP["arcive-memory MCP<br>separate Node process"] end W --> AUTH M --> AUTH H --> AUTH W --> ING M --> ING H --> ING ING --> S ING --> P1 P1 --> MOD1 P2 --> MOD2 P3 --> AI P5 --> DB P1 -. "on max-retry" .-> DLQ S -. "webhook" .-> MOD3 MOD3 --> S DB --> RT RT --> W RT --> M W --> CHAT M -. "γ.2 planned" .-> CHAT CHAT --> MCP MCP --> DB CHAT --> AI %% Edge index ordering (Mermaid counts in source order): %% 0-3: pipeline internal arrows (P1->P2->P3->P4->P5) -> purple %% 4-6: client->Auth (W,M,H) -> blue %% 7-9: client->Ingest (W,M,H) -> blue %% 10: ING->S -> blue %% 11: ING->P1 -> purple %% 12-14: P1->MOD1, P2->MOD2, P3->AI -> orange %% 15: P5->DB -> purple %% 16: P1->DLQ (dotted) -> red %% 17-18: S->MOD3, MOD3->S -> orange %% 19: DB->RT -> green %% 20-21: RT->W, RT->M -> green %% 22: W->CHAT -> green %% 23: M->CHAT (dotted, planned) -> gray %% 24-25: CHAT->MCP, MCP->DB -> green %% 26: CHAT->AI -> orange linkStyle 0,1,2,3,11,15 stroke:#8b5cf6,stroke-width:2px linkStyle 4,5,6,7,8,9,10 stroke:#3b82f6,stroke-width:2px linkStyle 12,13,14,17,18,26 stroke:#f59e0b,stroke-width:2px linkStyle 19,20,21,22,24,25 stroke:#10b981,stroke-width:2px linkStyle 16 stroke:#ef4444,stroke-width:2px linkStyle 23 stroke:#9ca3af,stroke-width:1.5px

Not in either diagram (planned / paused):

  • Pipecat voice service — scaffolded then paused per ADR-0010. Resumes when EAS unblocks mobile native deps AND speech-to-speech-vs-Pipecat decision lands.
  • LiveKit group conversation mode — V0.3 deferred with voice talk-back.
  • Public MCP server (Cloudflare Workers) — V1.0.
  • Stripe + RevenueCat billing — V0.1 scaffolded; tiers live by V1.0.

3. Phased Roadmap — Parallel Tracks

Both tracks ship at every phase boundary. Each phase ends with an integrated demo: software working end-to-end with the current hardware build.

gantt title ARCIVE — phased roadmap (parallel SW + HW) dateFormat YYYY-MM-DD axisFormat Wk %V section Software V0 PWA dictaphone (phone mic) :sw0, 2026-01-05, 21d V0.1 Diarization + re-ID + Reviewer + Pro :sw1, after sw0, 21d V0.2 Voice talk-back + Family + Mobile (Expo) :sw2, after sw1, 28d V0.3 Group conversation + Caregiver :sw3, after sw2, 28d V1.0 Marketplace + Public MCP + B2B pilot :sw4, after sw3, 56d V1.1+ Vertical packages + MaaS :sw5, after sw4, 56d section Hardware V0 XVF3800 dev-kit (raw I2S + WiFi upload) :hw0, 2026-01-05, 21d V0.1 Firmware: VAD, BLE pairing, LED, OTA :hw1, after hw0, 21d V0.2 Enclosure proto #1, 10 units, AEC :hw2, after hw1, 28d V0.3 50-unit pilot, DoA fusion, BLE control :hw3, after hw2, 28d V1.0 Enclosure v1 + 200-unit commercial :hw4, after hw3, 56d V1.1+ Industrial refresh + cert path :hw5, after hw4, 56d
PhaseWeeksVersionSoftwareHardware
01–3V0Web PWA dictaphone (phone mic)XVF3800 dev-kit bring-up; raw I2S → WiFi upload to same backend
14–6V0.1Diarization + re-ID + Reviewer role + Pro tierFull firmware on dev-kit — VAD, BLE pairing, status LED, OTA scaffold
27–10V0.2Voice talk-back¹ + Family + mobile (Expo)Enclosure prototype #1 (3D-printed), 10 internal units, AEC validated for talk-back
311–14V0.3Group conversation mode + Caregiver role50-unit pilot batch, DoA fusion live in pipeline, BLE control complete
415–22V1.0Marketplace + public MCP + B2B pilotEnclosure v1 production-intent, 200-unit commercial batch, OTA fleet mgmt
523+V1.1+Vertical packages + memory-as-a-serviceIndustrial-design refresh, optional e-ink screen, certification path

¹ Voice talk-back scaffolded then paused 2026-05-05 — see ADR-0010.

Both tracks integrate continuously. Software contract endpoints exist from Phase 0 so firmware can target them; firmware sends real audio from Phase 0 even if the enclosure is a breadboard.


4. Version Matrix — Capabilities by Release

Software capabilities

CapabilityV0V0.1V0.2V0.3V1.0V1.1+
Web app (PWA)
Native mobile (iOS+Android)
Magic-link auth
Consent screen
Capture from phone mic
Capture from ARCIVE device✅ dev-kit✅ dev-kit✅ proto✅ pilot✅ retail
List view + memory detail
Text search
Semantic search (embeddings)
Graph / Universe view
Speaker diarization
Cross-session speaker re-ID
AI roles (text)✅ Reviewer✅ +Tutor✅ +Caregiver✅ marketplace✅ vertical
AI talk-back (voice)
Group conversation mode
MCP serverinternal✅ public
Stripe / RevenueCat billingscaffolded✅ Pro✅ Family✅ marketplace✅ enterprise
Free / Pro / Family tiersscaffoldedFree+Pro+Family
Marketplace (custom roles)
B2B vertical packagespilot
Export (Markdown / Obsidian)
Family / shared spaces

Hardware capabilities (parallel track)

CapabilityV0V0.1V0.2V0.3V1.0V1.1+
Build formdev-kit (XVF3800+XIAO breadboard)dev-kitenclosure proto #1 (3D-print)pilot enclosure (3D-print refined)production-intent enclosure (SLA/CNC)injection-molded
Units in field5 internal10 internal10 + 5 design partners50 pilot200 commercial1k+
I2S audio capture (XVF3800 → ESP32-S3)
WiFi upload to backend
On-device VAD (silence not uploaded)basic✅ tuned
BLE provisioning (WiFi creds + JWT)
Status LED (breathing/solid/error)basic
Hardware mute (gates I2S clock)wired
Battery + USB-C chargedev-kitdev-kit✅ proto✅ pilot✅ retail
8–12 hr battery lifen/an/atested
DoA azimuth metadata per chunk✅ raw✅ fused with re-ID
AEC validated for talk-back
OTA firmware updatesscaffold✅ signed✅ A/B partition✅ fleet mgmt
Memfault / crash telemetry
Local circular buffer (offline)basic✅ 30-min
Optional e-ink/OLED screen✅ optional SKU
FCC/CE certification(skipped — research units)✅ retail

5. Deliverables by Phase — Both Tracks

Each phase ends with a joint integration demo: software + hardware working end-to-end together.

Phase 0 — V0 (Week 1–3)

Software

  • Next.js 15 PWA deployed to Vercel
  • Magic-link auth + consent screen
  • ingest-audio Edge Function live and accepting uploads from phone-mic AND device dev-kit
  • Phone-mic recording (getUserMedia + @ricky0123/vad-web)
  • Synchronous transcribe → store memory via Groq Whisper
  • Today (list) + Memory detail views
  • Postgres FTS text search
  • Full schema deployed (people / roles / role_sessions / memory_participants / subscription_tier)
  • Stripe customer pre-created on signup
  • PostHog + Sentry instrumented
  • 20 invited users

Hardware

  • XVF3800 + XIAO ESP32-S3 dev-kit assembled (5 units, breadboard / Seeed reference board)
  • I2S firmware flashed on XVF3800 (per Seeed wiki)
  • ESP32-S3 firmware: I2S capture → 30s WAV chunk → WiFi → POST /ingest-audio (hardcoded WiFi creds + dev JWT for now)
  • DoA azimuth queried via I2C, attached as metadata to each chunk
  • LED breathing animation on capture
  • Forks Seeed’s HTTP audio streaming sample for fastest path to working

Integration demo: dev-kit on a desk records a meeting, transcripts appear in the web app feed in real time alongside phone-mic recordings.


Phase 1 — V0.1 (Week 4–6)

Software

  • Pipeline moved to pgmq queue + step workers
  • Diarization (Deepgram Nova-3)
  • Pyannote.audio worker on Modal for speaker re-ID
  • Voyage-3-lite embeddings + pgvector HNSW
  • Semantic search live
  • Universe / graph view (react-force-graph web)
  • First AI role: Reviewer (text-only)
  • Pro tier ($12/mo) launches
  • Markdown export

Hardware

  • Same dev-kit, full firmware feature set
  • VAD gating: silence never uploaded (uses XVF3800 VAD signal via I2C)
  • BLE GATT server implemented (provisioning + control + status characteristics per §6.3)
  • Pairing flow: app QR → BLE write of WiFi creds + device JWT → device reboots and joins WiFi
  • LED states: idle / recording / muted / uploading / error
  • OTA scaffolding (esp_https_ota wired up, manifest poll daily)
  • 10 dev-kit units in internal use

Integration demo: paired device captures a multi-speaker meeting; diarization labels appear; same speaker recognized across two separate sessions.


Phase 2 — V0.2 (Week 7–10)

Software

  • Expo mobile app (iOS + Android) with feature parity
  • pnpm workspace, packages/db, packages/shared, packages/agents
  • Voice talk-back loop: Pipecat + Deepgram streaming STT + Cartesia Sonic TTS
  • Claude Agent SDK driver replaces stateless RAG
  • Roles: Tutor, Brainstorm Partner
  • Family tier ($25/mo) — spaces, multi-member, caregiver role
  • Offline recording on mobile + queue-and-forward
  • App Store + Play Store submissions

Hardware

  • Enclosure prototype #1: 3D-printed shell housing XVF3800 + XIAO + LiPo + USB-C charge IC + LED + mute button
  • 10 internal units + 5 design-partner units
  • AEC validated for talk-back use case (XVF3800 onboard AEC; verified that device speaker echo doesn’t pollute uploaded audio when external speaker is involved — even though device has no speaker, AEC matters for talk-back via paired phone)
  • Hardware mute button wired to physically gate I2S clock (security-critical)
  • 8-hour battery test passes
  • Local 30-min circular buffer for offline resilience
  • OTA channel dev live; firmware version reported via BLE characteristic
  • Memfault (or equivalent) crash reporting

Integration demo: user wears device on lanyard for a full work day; offline periods buffer locally; device firmware updates over OTA without user intervention; talk-back works via paired phone with the device as input.


Phase 3 — V0.3 (Week 11–14)

Software

  • LiveKit-based group conversation mode (continuous WebRTC, server-side multi-party room)
  • Backend bridge: device → HTTPS chunked upload (1s Opus chunks) → bridge publishes as a LiveKit participant track. Device → backend speaks HTTP; backend → LiveKit speaks WebRTC. Web/mobile clients join the room directly via LiveKit SDKs.
  • Agent interject() capability — agent can speak into group conversation as a defined role (audio TTS sent into the LiveKit room as a participant)
  • Caregiver role with per-person consent flow
  • DoA metadata fused with diarization in pipeline → higher-confidence speaker labels in multi-person rooms
  • Internal MCP server stood up (used by agents/roles as retrieval layer)

Hardware

  • 50-unit pilot batch of refined enclosure (still 3D-printed, but iterated form factor based on Phase 2 feedback)
  • Industrial-design partner engaged for V1.0 enclosure
  • Continuous-streaming firmware mode (for group conversation): instead of 30s chunks, opens a sustained WebRTC/HTTPS stream
  • Power profile validated for sustained streaming (target ≥4 hr in this mode)
  • BLE notify channels for real-time mute/battery push
  • Pairing flow polished (sub-30s end-to-end)
  • 50 units shipped to design partners + early adopters
  • Telemetry dashboard live: per-device upload volume, battery health, crash rate, daily active devices

Integration demo: device placed on conference table during a 4-person meeting; group mode active; speakers identified by name from the second session onward; AI role (Caregiver / Reviewer) interjects appropriately when invoked.


Phase 4 — V1.0 (Week 15–22)

Software

  • Public MCP server — Pro users plug ARCIVE memory into Claude Desktop / ChatGPT / Cursor
  • Role marketplace UI (browse / install / publish)
  • Stripe Connect for marketplace payouts (70/30)
  • B2B admin dashboard (multi-staff, audit log, org-level billing)
  • First B2B pilot signed (likely caregiving)
  • SOC 2 Type 1 prep started

Hardware

  • Production-intent enclosure (SLA or CNC; injection-molded slated for v1.5)
  • 200-unit commercial batch manufactured
  • OTA with A/B partitions + signed firmware + automatic rollback on failed boot
  • Fleet management telemetry (firmware version distribution, error rate by version, retire-and-replace flow)
  • Hardware mute validation re-tested: cannot be defeated in software (security audit)
  • LED VAD-driven indicator validated (cannot be turned off while mic is hot)
  • Retail packaging design begins
  • App Store listing live with hardware as upsell (“Get the ARCIVE device for $129”)

Integration demo: end-to-end retail experience — user buys device, scans QR in app, pairs in 30s, captures and reviews memories via marketplace role, plugs into Claude Desktop via MCP for cross-tool agent access.


Phase 5 — V1.1+ (Week 23+)

Software

  • Memory-as-a-Service API tier (usage-billed)
  • ARCIVE for Caregiving (per-resident license, ~$50–100/mo)
  • ARCIVE for Education (per-student/school license)
  • ARCIVE for Therapy (per-therapist, ~$50–100/mo)
  • Smartwatch companion (Apple Watch / Wear OS) for intentional dictation when device isn’t worn
  • Multi-language support (start ES / FR / DE)

Hardware

  • Industrial-design refresh based on V1.0 field feedback
  • Optional e-ink or OLED screen SKU (the original V0.3 “screen” idea, deferred to here once we know what users want to see)
  • Replaceable / longer-runtime battery option
  • Injection-molded enclosure
  • FCC / CE certification (required for retail at scale)
  • Retail distribution partnerships (DTC, Amazon, possibly Best Buy)
  • Variants: clip / pendant / tabletop puck — based on V1.0 user preference data

6. Hardware ↔ Software Contract

This is the integration surface. Both teams must respect it. See full detail in plan files.

6.1 Audio Upload (HW → Backend)

POST https://<project>.supabase.co/functions/v1/ingest-audio
Headers:
Authorization: Bearer <device_token> # JWT signed at pairing time
X-Device-Id: <uuid>
Content-Type: audio/wav (or audio/opus)
Body: raw audio chunk (≤ 30s, VAD-trimmed)
Query: ?recorded_at=<iso8601>&doa_json=<urlencoded>
Response: 202 { recording_id }
  • Chunks ≤ 30 seconds (Edge Function timeout safety)
  • Audio format: Opus mono @ 24 kbps (server billing assumes a 16 kbps floor; ~2 KB/s)
  • VAD on-device — silence is never uploaded
  • DoA metadata as compact JSON: [{t_ms: 0, az: 87}, {t_ms: 1200, az: 92}, ...]
  • Retry with exponential backoff if offline; local circular buffer ≥ 30 minutes
  • Raw audio is retained in private Storage (audio/{user_id}/{recording_id}.{ext}) and is replayable from the memory detail page via short-lived signed URL

6.2 Pairing & Provisioning (App ↔ HW)

  • App generates a pairing QR code containing: { pairing_url, pairing_token, supabase_url }
  • User scans QR in app’s pair-device flow
  • HW receives WiFi creds + Supabase device JWT over BLE GATT (one-time, write-only characteristic)
  • HW writes its mac_address back to app over BLE notify
  • App calls POST /devices to register, links to user account
  • BLE characteristic UUIDs live in shared/ble-uuids.ts — single source of truth, imported by both firmware and app

6.3 BLE GATT Schema (V0.3+)

ServiceCharacteristicDirectionPurpose
ARCIVE_PROVwifi_credsApp → HW (write)SSID + password JSON
ARCIVE_PROVdevice_jwtApp → HW (write)Supabase upload token
ARCIVE_PROVmacHW → App (read+notify)Device MAC
ARCIVE_CTRLmuteApp ↔ HW (read+write+notify)0=record, 1=muted
ARCIVE_STATUSbatteryHW → App (read+notify)0–100
ARCIVE_STATUSrecording_stateHW → App (notify)idle/recording/uploading/error
ARCIVE_STATUSfirmware_versionHW → App (read)Semver string

UUIDs to be generated once and committed to shared/ble-uuids.ts.

6.4 Realtime sync (Backend → App)

  • App subscribes to Supabase Realtime channel for recordings table filtered to current user
  • HW upload arrives → row inserted → app sees it appear in real time
  • Pipeline status updates (pending → processing → done) push the same way
  • App never polls
  • HW LED must be visibly on (breathing) whenever mic is unmuted — non-overridable from firmware
  • Mute button is hardware-level: cuts I2S clock to mic array, not software-bypassable
  • App displays a “currently recording” indicator if any paired device is unmuted
  • First-run consent screen must be acknowledged before any recording is ingested
  • Device token can be revoked from app → HW receives revoke over BLE → wipes local creds

7. Cost Model

V0 (50 users, ~1 hr/day each, phone mic + 5 dev-kit HW units)

LineEstimate
Supabase$0 (free tier)
Vercel$0 (free tier)
Groq Whisper~$60/mo (50 × 30 hr × $0.04)
Voyage embeddings~$2
Gemini Flash summaries~$5
Sentry + PostHog$0 (free tiers)
Recurring total~$70/mo
HW capex (one-time)~$270 (5× Seeed XVF3800 + XIAO dev-kits @ $54.50)

V0.1 (Wk 4–6, 100 users + 10 HW dev-kits)

LineEstimate
Supabase$0–25
Vercel$0
AI services (Groq + Deepgram + Voyage + Gemini)~$150
Pyannote on Modal~$30
Recurring total~$200/mo
HW capex (one-time, marginal)~$270 (5 more dev-kits)

V0.2 (500 users, mix of free + Pro, 15 HW prototype units)

LineEstimate
Supabase Pro$25
Vercel Pro$20
Transcription~$300
Embeddings + summaries~$50
Voice (Cartesia + Deepgram streaming) for Pro users~$200
Pyannote (self-hosted on Modal)~$80
Sentry/PostHog$0–50
Recurring total~$700/mo
HW capex (one-time)~$1,500 (5 more dev-kits + 3D-print materials + LiPo batteries + USB-C + enclosure iteration)
Revenue (assume 10% Pro at $12)$600/mo
Net (recurring)roughly break-even

V1.0 (5,000 users + 200 HW units in field)

LineEstimate
Infra (Supabase, Vercel, LiveKit)~$500
AI services~$3,000
Hardware COGS amortized~$1,500/mo (assuming $30 unit cost, 200 sold over 2 mo)
Total~$5,000/mo
Revenue (assume 12% Pro, 3% Family, hardware margin)~$15,000/mo
Gross margin~67%

8. Monetization Tiers

TierPriceTargetKey features
Free$0Trial / casual5 hr/mo, 1 default companion role, text-only
Pro$12/moPower individual50 hr/mo, all built-in roles, voice talk-back (5 hr), graph view, MCP access, exports
Family$25/moHouseholds / caregiving5 members, shared spaces, caregiver role, group mode, multi-device
Marketplacerev-shareCreators + buyersCustom roles published by creators, 70/30 split
Caregiving B2B$50–100/resident/moAssisted livingCompliance, audit, multi-staff access
Education B2Bper-student/schoolSchools / tutorsStudy companion, syllabus-aware retrieval
Therapy B2B$50–100/therapist/moPractitionersSession capture (with consent), patient role-play for practice
API / Memory-as-a-Serviceusage-basedAI buildersMCP endpoint, vector + entity APIs

9. Risk Register

RiskLikelihoodImpactMitigation
Always-on recording legal exposureHighExistentialConsent screen V0, hardware LED non-overridable, device-side mute (gates I2S clock at hardware level), 2-party consent default
AI vendor price hikesMediumHighMultiple STT providers integrated (Groq + Deepgram), swappable agent layer (Pipecat + Claude Agent SDK + fallback OpenAI Realtime driver)
Hardware delays block roadmapMediumMediumBoth tracks decoupled at the integration contract (§6); HW running on Seeed reference board through Phase 3 means delays are firmware/enclosure scope, not platform-level. App always works without device.
Speaker re-ID quality poorMediumHighSelf-host Pyannote on Modal, allow user manual labeling, build feedback loop into UX
Competitors (Limitless, Plaud, Friend)HighMediumWedge = AI roles + group mode + platform/MCP + variants, not capture alone
Free-tier abuseMediumMedium5-hour cap, device-bound tokens, abuse rate-limits in Edge Functions
Privacy breach / data leakLowExistentialEncryption at rest, short retention defaults, per-user delete-all flow, SOC 2 Type 1 path planned for V1.0
ESP32-S3 device-side WebRTC at edge of chip capabilityMediumLowV0.3 ships HTTPS chunked upload + backend LiveKit bridge (proven path). esp-webrtc evaluated as Phase 2 research spike; if not viable, bridge remains permanent — group-mode quality unaffected, only data path changes.
Voice talk-back latency exceeds 1.5s budgetMediumHighPhase 2 ships explicit benchmarking (01_SOFTWARE_PLAN.md §1.8). Mitigations: co-locate Deepgram + Cartesia regions with user; switch to Cartesia edge endpoints; downgrade LLM to 7B-on-Groq for faster first-token.
iOS App Store rejection of ambient mic captureMediumHighExplicit App Review note: user-initiated recording, persistent visual indicator (LED on HW, banner in app), consent screen on first launch; provide test account with HW in dev mode
Variant feature creep (cellular, screen, local) before V1.0 landsMediumMediumV1.0 is clip + tabletop only; variants are architected for, not committed. Schema accommodates them without migration.

10. Repository Layout

arcive/
├── docs/ # ← you are here
│ ├── 00_MASTER_PLAN.md
│ ├── 01_SOFTWARE_PLAN.md
│ └── 02_HARDWARE_PLAN.md
├── apps/
│ ├── web/ # Next.js 15 PWA (V0 → onwards)
│ └── mobile/ # Expo / React Native (V0.2 → onwards)
├── packages/
│ ├── db/ # Supabase schema, migrations, generated types
│ ├── shared/ # Zod schemas, agent interface, BLE UUIDs
│ └── agents/ # Role definitions, system prompts, tools
├── backend/
│ ├── functions/ # Supabase Edge Functions
│ ├── workers/ # Queue workers (pgmq) for pipeline steps
│ └── mcp/ # MCP server (V0.3+)
├── firmware/ # ESP32-S3 firmware (V0.3+)
│ ├── src/
│ ├── platformio.ini
│ └── tests/
└── shared/
└── ble-uuids.ts # Single source of truth for HW + App

11. Decision Log

DateDecisionReason
2026-05-03Build software-first, defer custom HW to Phase 4Hardware kills startups; software validates demand cheaply
2026-05-03Web PWA before native mobile3-week ship, no app review, faster iteration
2026-05-03Groq Whisper over AssemblyAI for V0~10x cheaper, no diarization needed yet
2026-05-03HNSW over ivfflat in pgvectorBetter recall, scales further, default in 0.5+
2026-05-03Supabase as full backendSingle vendor, single dashboard, generous free tier
2026-05-03Schema includes people/roles/sessions from V0Avoids migration pain when companion features land
2026-05-03MCP-first internal retrieval APIDoubles as B2B/platform offering later
2026-05-03HW v0 = white-label, HW v1 = custom XVF3800De-risk hardware before custom PCB investment
2026-05-03LiveKit for group mode media layerOpen source, free tier, OpenAI Realtime uses it
2026-05-03Stripe + RevenueCat for billingRevenueCat handles iOS/Android/web in one SDK