ARCIVE — Hardware Plan

Companion to 00_MASTER_PLAN.md and 01_SOFTWARE_PLAN.md. Hardware is a co-equal track to software, developed in parallel from week 1. The device is core product identity, not a peripheral.

1. Strategic Stance

ARCIVE is a hardware & software platform (per the canonical product definition in 00_MASTER_PLAN §1). Hardware is not “a device” — it’s a family of form factors sharing one firmware platform, one cloud contract, and one user experience.
The first ARCIVE device is the product’s physical identity. Not a phone app trying to be calm — a dedicated, screenless, distraction-free capture surface that lives with the user.
Hardware ships at every phase boundary, integrated continuously with software via the contract in 00_MASTER_PLAN §6.
Form evolves: dev-kit (Phase 0) → 3D-printed proto (Phase 2) → pilot enclosure (Phase 3) → production-intent (Phase 4) → injection-molded retail (Phase 5) → variants (V1.5+).
No white-label detour. We commit to the XVF3800 + XIAO ESP32-S3 platform from day one because: (a) it’s the only path to a true mic-array differentiator, (b) the dev-kit is buyable today for $54, (c) iterating on enclosure is cheaper than re-platforming firmware later.
The app works without any device, but the device family is the way ARCIVE is meant to be experienced.

Variant lineup (long view)

The first device is clip / pendant + tabletop puck (V1.0, Phase 4). Future variants reuse the same firmware and cloud platform — only enclosure, sensor array, connectivity, and (sometimes) MCU change. Full lineup is in 00_MASTER_PLAN §2.5; summary:

Variant	When	Delta from V1.0
Clip / Pendant	V1.0 (Phase 4)	Baseline
Tabletop puck	V1.0 SKU or V1.5	Larger battery, plugged-in option, far-field tuning
Pendant refresh	V1.5	Industrial-design only
Watch companion (Apple/Wear OS)	V1.1+	Software only, no new HW
Card format	V2.x	Slim form, single MEMS, BLE-tethered to phone
Screen-equipped	V2.x	Adds e-ink/OLED + firmware UI layer
Cellular	V2.x	LTE-M / NB-IoT module, eSIM, no-WiFi flow
Local-only	V2.x	Larger MCU/co-processor, on-device STT + embeddings

Each variant must respect the same HW↔SW contract (§6) — device.kind enum, capability flags, optional cellular/local metadata. We don’t build them now, but the platform is architected to allow them without rework.

2. Why XVF3800 + XIAO ESP32-S3 from Day One

Reason	Detail
Onboard DSP	AEC, beamforming, DoA, VAD, noise suppression — all on the XVF3800. ESP32-S3 doesn’t have to do signal processing.
Far-field 5m	Phone mic can’t compete in multi-person rooms. This is the wedge.
Buyable today	Seeed sells the dev-kit for $54.50. No sourcing wait.
Reference firmware exists	I2S audio, HTTP streaming, BLE provisioning, VAD, DoA — all sample tutorials at the Seeed wiki.
Single platform across all versions	Phase 0 dev-kit and Phase 5 retail device share the same MCU and DSP. Firmware investment compounds.
Industrial-design freedom	The hardware spec is fixed early; only enclosure changes between phases.

3. Hardware Architecture (single design, all phases)

┌──────────────────────────────────────────────────────┐
│                   ARCIVE DEVICE                      │
│                                                      │
│   4× MEMS mics ─► XVF3800 (DSP)                     │
│                     │                                │
│                     ├─► I2S audio  ─► XIAO ESP32-S3  │
│                     ├─► I2C control ──┘              │
│                     ├─► I2C: DoA azimuth queries     │
│                     └─► I2C: VAD state               │
│                                                      │
│   Mute button ─► HARDWARE GATE on I2S clock          │
│                  (mic data physically cannot reach   │
│                   ESP32-S3 when muted)               │
│                                                      │
│   LED ─◄ ESP32-S3 PWM (driven from XVF3800 VAD +    │
│          firmware mute state, never bypassable)      │
│                                                      │
│   USB-C ─► charging + flashing                      │
│   LiPo 800–1200 mAh                                 │
│                                                      │
│   ESP32-S3 ─► WiFi (audio upload to Supabase)       │
│            └► BLE (pairing + control + status)      │
└──────────────────────────────────────────────────────┘

This same architecture ships at every phase. What changes is enclosure, polish, manufacturing scale.

4. Bill of Materials (target)

Element	Spec	Source
Mic array	ReSpeaker XVF3800	Seeed, ~$40 ref board
MCU	XIAO ESP32-S3	Seeed, included in dev-kit
Battery	LiPo 800–1200 mAh	Standard, ~$3
Charge IC + protection	TP4056 or equivalent	~$1
Indicator	1× soft LED (PWM-driven)	<$0.5
Mute button	Tactile switch wired to I2S clock gate	<$0.5
USB-C connector	Standard	~$1
Enclosure	Phase 0–3: 3D-printed; Phase 4: SLA/CNC; Phase 5: injection-molded	Variable
Audio output	None on device	—

Target unit cost at MOQ 500 (Phase 4): ~$45 BOM. Retail ~$129–149.

5. Phase Deliverables — Hardware Track

Phase 0 — V0 (Wk 1–3): Dev-kit bring-up

5× Seeed XVF3800 + XIAO ESP32-S3 dev-kits assembled (breadboard or reference PCB)
XVF3800 flashed with I2S firmware (not USB) per Seeed wiki
ESP32-S3 firmware: I2S capture → 30s WAV chunk → WiFi → POST /ingest-audio
DoA azimuth read via I2C, attached as JSON metadata
LED breathing animation during capture
Hardcoded WiFi creds + dev JWT (no BLE provisioning yet)
Forks Seeed’s HTTP audio streaming sample for fastest path
Internal use by team only

Demo: dev-kit on a desk uploads recordings that appear in the same web app feed as phone-mic recordings.

Phase 1 — V0.1 (Wk 4–6): Full firmware feature set

Same 5–10 dev-kits, no enclosure changes
VAD gating: silence never uploaded (XVF3800 VAD signal via I2C)
BLE GATT server implemented: provisioning + control + status (per §6.3)
Pairing flow end-to-end: app QR → BLE write of WiFi creds + JWT → device joins WiFi
LED state machine: idle / recording / muted / uploading / error
OTA scaffolding: esp_https_ota, daily manifest poll
BLE characteristic for firmware_version, battery, recording_state
10 dev-kits in internal use

Phase 2 — V0.2 (Wk 7–10): Enclosure prototype #1

3D-printed enclosure housing XVF3800 + XIAO + LiPo + USB-C charge IC + LED + mute button
Form factor decision: clip-on vs. lanyard vs. tabletop puck — pick one based on user research; build 2–3 variants
10 internal units + 5 design-partner units
Hardware mute button physically gates I2S clock (security-critical; not bypassable in software)
AEC validated for talk-back (XVF3800 onboard AEC; verified that talk-back via paired phone speaker doesn’t pollute uploaded audio)
8-hour battery test passes under realistic VAD duty cycle
Local 30-min circular buffer for offline resilience
OTA dev channel live; firmware version reported via BLE
Memfault (or equivalent) crash reporting integrated

Phase 3 — V0.3 (Wk 11–14): 50-unit pilot

Refined 3D-printed enclosure based on Phase 2 user feedback
Industrial-design partner engaged for V1.0 enclosure
Continuous-streaming firmware mode (group conversation): instead of 30s chunks, opens sustained HTTPS chunked upload (1-second Opus chunks via HTTP/2 keep-alive) to a backend bridge that re-fans into the LiveKit room. The ESP32-S3 only speaks HTTP; the backend handles WebRTC. (Device-side WebRTC was evaluated as a Phase 2 research spike — see §11 risks — but is at the edge of ESP32-S3 capability and is not committed for V0.3.)
Power profile validated for sustained streaming (target ≥4 hr in this mode)
BLE notify channels for real-time mute/battery push
Pairing flow polished (sub-30s end-to-end)
50 units shipped to design partners + early adopters
Telemetry dashboard live: per-device upload volume, battery health, crash rate, DAUs

Phase 4 — V1.0 (Wk 15–22): Production-intent + commercial batch

Production-intent enclosure (SLA or CNC; injection-molded slated for v1.5)
200-unit commercial batch manufactured
OTA with A/B partitions + signed firmware + automatic rollback on failed boot
Fleet management telemetry: firmware version distribution, error rate by version, retire-and-replace flow
Security audit on hardware mute — re-validated as software-undefeatable
LED VAD-driven indicator audit — cannot be turned off while mic is hot
Retail packaging design begins
App Store listing live: device as $129 upsell

Phase 5 — V1.1+ (Wk 23+): Retail-grade

Industrial-design refresh based on V1.0 field feedback
Optional SKU with e-ink or OLED screen — the original “screen on device” idea, now informed by real user data on what they want to see
Replaceable or longer-runtime battery option
Injection-molded enclosure
FCC / CE certification (required for retail at scale)
Variants: clip / pendant / tabletop puck — based on V1.0 user preference data
Distribution partnerships (DTC, Amazon, possibly Best Buy)

6. Hardware ↔ Software Contract

Mirrors 00_MASTER_PLAN.md §6. The hardware-side responsibilities:

6.1 Audio Upload

Endpoint: POST https://<project>.supabase.co/functions/v1/ingest-audio
Auth: Authorization: Bearer <device_jwt> + X-Device-Id: <uuid>
Body: VAD-trimmed audio chunk, ≤ 30s, WAV (Phase 0) → Opus (Phase 1+)
Query: recorded_at (ISO 8601), doa_json (urlencoded compact JSON)
Retry: exponential backoff, up to ~30 min local buffer

6.2 Group-mode streaming (Phase 3+)

Device opens sustained HTTPS chunked upload (1-second Opus chunks via HTTP/2 keep-alive) to /functions/v1/group-stream with a session token issued by /functions/v1/start-group-session
Backend bridge receives chunks and publishes them as a participant track into the LiveKit room — ESP32-S3 itself does not speak WebRTC
AI role interjections (TTS) are played via the paired phone, not on the device (device has no speaker)
Phase 2 research spike: evaluate esp-webrtc for direct device → LiveKit. If viable, swap transport in Phase 4 without API changes.

6.3 BLE GATT Services

Service	Characteristic	Direction	Purpose
`ARCIVE_PROV`	`wifi_creds`	App → HW (write)	SSID + password JSON
`ARCIVE_PROV`	`device_jwt`	App → HW (write)	Supabase upload token
`ARCIVE_PROV`	`mac`	HW → App (read+notify)	Device MAC
`ARCIVE_CTRL`	`mute`	App ↔ HW (read+notify; HW write only)	Hardware-driven mute state
`ARCIVE_STATUS`	`battery`	HW → App (read+notify)	0–100
`ARCIVE_STATUS`	`recording_state`	HW → App (notify)	idle/recording/uploading/error
`ARCIVE_STATUS`	`firmware_version`	HW → App (read)	Semver string

UUIDs generated once, stored in shared/ble-uuids.h (firmware) and packages/shared/ble-uuids.ts (software). CI script enforces sync.

6.4 Pairing flow

App generates QR with { pairing_url, pairing_token, supabase_url }
User scans → app initiates BLE connection to advertised ARCIVE_PROV service
App writes WiFi creds + device JWT
HW reads back its MAC, writes to mac characteristic; app calls POST /devices
HW reboots, joins WiFi, starts heartbeat (uploads 0-byte status ping)
App shows “Device paired ✓“

6.5 OTA firmware update

Firmware binaries uploaded to Supabase Storage at firmware/<channel>/<version>.bin
HW polls JSON manifest at firmware/<channel>/latest.json daily
Newer version available → download via esp_https_ota, verify signature, reboot
Channels: dev (Phase 1+), beta (Phase 3+), stable (Phase 4+)
A/B partitions + automatic rollback on boot failure (Phase 4+)

Non-negotiable, all phases:

Requirement	Implementation
LED visibly on when mic is hot	PWM-driven from firmware AND tied to XVF3800 VAD signal; cannot be turned off in software while mic is unmuted
Hardware mute cuts mic at hardware level	Mute button physically gates the I2S clock to the mic array; not just a software flag
Mute state cannot be overridden remotely	App can only read mute state, never set it to unmute
Optional audible chime on session start	Firmware setting; off by default but available for two-party-consent jurisdictions
On-device data encrypted at rest	ESP32-S3 NVS encryption enabled
Device JWT can be revoked	App sends revoke over BLE → firmware wipes WiFi creds + JWT, factory-resets

These rules make ARCIVE defensibly respectful of bystanders in a way Limitless / Friend / Plaud have struggled with — and they’re a real selling point for Caregiving / Therapy B2B.

8. Firmware Repository Layout

firmware/
├── platformio.ini
├── shared/
│   └── ble-uuids.h               # ← mirror of packages/shared/ble-uuids.ts
├── src/
│   ├── main.cpp                  # Entry, state machine, OTA check
│   ├── config.h                  # Endpoints, buffer sizes, timeouts
│   ├── audio/
│   │   ├── i2s_capture.cpp       # XVF3800 → ESP32-S3 I2S
│   │   ├── opus_encoder.cpp      # 30s Opus chunks (Phase 1+)
│   │   └── chunker.cpp           # Chunk sealing, metadata stamping
│   ├── stream/                   # Phase 3+
│   │   └── group_stream.cpp      # Continuous HTTPS chunked upload (1s Opus chunks via HTTP/2 keep-alive) to backend bridge — device does NOT speak WebRTC
│   ├── upload/
│   │   ├── wifi_manager.cpp
│   │   ├── https_client.cpp
│   │   └── circular_buffer.cpp   # Local store-and-forward
│   ├── ble/
│   │   ├── gatt_server.cpp       # ARCIVE_PROV / ARCIVE_CTRL / ARCIVE_STATUS
│   │   ├── provisioning.cpp
│   │   └── status_notifier.cpp
│   ├── led/
│   │   ├── breathing.cpp
│   │   └── states.cpp
│   ├── doa/
│   │   └── azimuth_reader.cpp
│   ├── vad/
│   │   └── vad_gate.cpp
│   ├── ota/
│   │   └── update_check.cpp
│   └── telemetry/
│       └── crash_reporter.cpp
└── tests/
    ├── audio/
    ├── ble/
    └── upload/

PlatformIO baseline

[env:xiao_esp32s3]
platform = espressif32
board = seeed_xiao_esp32s3
framework = arduino
lib_deps =
    ESP Async WebServer
    ArduinoJson
    NimBLE-Arduino
    https://github.com/respeaker/xvf3800-arduino-driver  ; or vendor SDK
build_flags =
    -DARCIVE_FW_VERSION=\"0.1.0\"
    -DBLE_UUID_HEADER=\"shared/ble-uuids.h\"

9. Hardware Test Matrix

Test	P0 dev-kit	P1 dev-kit	P2 proto	P3 pilot	P4 commercial
Boots cleanly from cold	✅	✅	✅	✅	✅
Pairs via BLE in <30s	—	✅	✅	✅	✅
Joins WiFi after reboot	✅	✅	✅	✅	✅
Survives WiFi outage 30 min	basic	✅	✅	✅	✅
Mute button cuts I2S at hardware level	—	wired	✅	✅	✅
LED matches actual mic state	✅	✅	✅	✅	✅
8-hour battery test	n/a	n/a	✅	✅	✅
Upload retries succeed after disconnect	basic	✅	✅	✅	✅
OTA update completes successfully	—	dev channel	✅	✅	✅ A/B rollback
Crash report reaches Memfault	—	—	✅	✅	✅
Factory reset wipes all creds	—	✅	✅	✅	✅
Multi-speaker capture quality	✅ raw	✅	✅	✅	✅
DoA azimuth attached as metadata	✅ raw	✅	✅	✅	✅
DoA accuracy ±15° (validated against ground truth)	—	—	✅	✅	✅
AEC functional during talk-back	—	—	✅	✅	✅
Group-mode sustained stream ≥4 hr	—	—	—	✅	✅

10. What Hardware Does NOT Do (Permanently)

No on-device transcription — too heavy for ESP32-S3, kills battery, redundant with cloud
No custom wake-word — adds complexity; doesn’t match always-on capture model
No BLE audio streaming — bandwidth ceiling; WiFi only for audio
No on-device speaker identification — voice embeddings happen server-side
No on-device speaker — talk-back uses paired phone for playback (until product feedback says otherwise)
No camera — out of scope; privacy-incompatible

11. Risks & Mitigations

Risk	Mitigation
Industrial design takes longer than expected	3D-printed enclosures used through Phase 3; production-intent only at Phase 4 — buys 14 weeks of design lead time
Custom PCB defects	Phase 0–3 ride on Seeed reference board; only Phase 4 introduces our own PCB (and minimally — same components, different layout)
Battery life shorter than 8 hr in field	Tested with realistic VAD duty cycle from Phase 2 onward
FCC/CE certification delays	Skipped for Phase 0–4 (research/dev units); required only for Phase 5 retail
Firmware bricking devices	OTA includes A/B partitions, rollback on failed boot (Phase 4+)
Mute button defeated in software	Hardware gate on I2S clock makes this physically impossible
Group-mode streaming drains battery too fast	Power profile measured Phase 2; decision in Phase 3 whether to ship group mode as “tabletop, plugged in” only
User doesn’t see why they need the device when phone works	Continuously demo the quality difference (multi-speaker, far-field) and the behavioral difference (no screen, no doom-scroll) at every user touchpoint
Device-side WebRTC at edge of ESP32-S3 capability	V0.3 ships HTTPS chunked upload + backend LiveKit bridge (proven path). `esp-webrtc` evaluated as a Phase 2 research spike; if viable, swap transport in Phase 4 without changing API contract. If not viable, backend bridge remains permanent — group-mode quality is unaffected, only data path changes.

12. Phase-Boundary Joint Demos

Each phase ends with a demo that proves both tracks integrate. These are non-negotiable ship gates.

Phase	Joint demo
0	Dev-kit on desk uploads recording → web app feed shows it next to phone-mic recordings
1	Pair dev-kit via QR scan → device joins WiFi → captures multi-speaker meeting → diarization labels appear → same speaker recognized across two sessions
2	User wears 3D-printed proto for full work day → offline periods buffered locally → OTA update lands without user action → talk-back via paired phone using device as input
3	Pilot device on conference table during 4-person meeting → group mode active → speakers identified by name from session 2 → AI role interjects when invoked
4	Retail experience: buy device → scan QR → paired in 30s → capture & review via marketplace role → plug into Claude Desktop via MCP for cross-tool agent access
5	Pick up retail-packaged device from box → certified, injection-molded, optional screen variant → identical software experience as V1.0

ARCIVE — Hardware Plan

1. Strategic Stance

Variant lineup (long view)

2. Why XVF3800 + XIAO ESP32-S3 from Day One

3. Hardware Architecture (single design, all phases)

4. Bill of Materials (target)

5. Phase Deliverables — Hardware Track

Phase 0 — V0 (Wk 1–3): Dev-kit bring-up

Phase 1 — V0.1 (Wk 4–6): Full firmware feature set

Phase 2 — V0.2 (Wk 7–10): Enclosure prototype #1

Phase 3 — V0.3 (Wk 11–14): 50-unit pilot

Phase 4 — V1.0 (Wk 15–22): Production-intent + commercial batch

Phase 5 — V1.1+ (Wk 23+): Retail-grade

6. Hardware ↔ Software Contract

6.1 Audio Upload

6.2 Group-mode streaming (Phase 3+)

6.3 BLE GATT Services

6.4 Pairing flow

6.5 OTA firmware update

8. Firmware Repository Layout

PlatformIO baseline

9. Hardware Test Matrix

10. What Hardware Does NOT Do (Permanently)

11. Risks & Mitigations

12. Phase-Boundary Joint Demos

Plans

Operations

Decisions (ADRs)

Discussions

ARCIVE — Hardware Plan

1. Strategic Stance

Variant lineup (long view)

2. Why XVF3800 + XIAO ESP32-S3 from Day One

3. Hardware Architecture (single design, all phases)

4. Bill of Materials (target)

5. Phase Deliverables — Hardware Track

Phase 0 — V0 (Wk 1–3): Dev-kit bring-up

Phase 1 — V0.1 (Wk 4–6): Full firmware feature set

Phase 2 — V0.2 (Wk 7–10): Enclosure prototype #1

Phase 3 — V0.3 (Wk 11–14): 50-unit pilot

Phase 4 — V1.0 (Wk 15–22): Production-intent + commercial batch

Phase 5 — V1.1+ (Wk 23+): Retail-grade

6. Hardware ↔ Software Contract

6.1 Audio Upload

6.2 Group-mode streaming (Phase 3+)

6.3 BLE GATT Services

6.4 Pairing flow

6.5 OTA firmware update

7. Privacy & Consent — Hardware Requirements

8. Firmware Repository Layout

PlatformIO baseline

9. Hardware Test Matrix

10. What Hardware Does NOT Do (Permanently)

11. Risks & Mitigations

12. Phase-Boundary Joint Demos

Plans

Operations

Decisions (ADRs)

Discussions