Companion to 00_MASTER_PLAN.md and 01_SOFTWARE_PLAN.md. Hardware is a co-equal track to software, developed in parallel from week 1. The device is core product identity, not a peripheral.
1. Strategic Stance
- ARCIVE is a hardware & software platform (per the canonical product definition in 00_MASTER_PLAN §1). Hardware is not “a device” — it’s a family of form factors sharing one firmware platform, one cloud contract, and one user experience.
- The first ARCIVE device is the product’s physical identity. Not a phone app trying to be calm — a dedicated, screenless, distraction-free capture surface that lives with the user.
- Hardware ships at every phase boundary, integrated continuously with software via the contract in 00_MASTER_PLAN §6.
- Form evolves: dev-kit (Phase 0) → 3D-printed proto (Phase 2) → pilot enclosure (Phase 3) → production-intent (Phase 4) → injection-molded retail (Phase 5) → variants (V1.5+).
- No white-label detour. We commit to the XVF3800 + XIAO ESP32-S3 platform from day one because: (a) it’s the only path to a true mic-array differentiator, (b) the dev-kit is buyable today for $54, (c) iterating on enclosure is cheaper than re-platforming firmware later.
- The app works without any device, but the device family is the way ARCIVE is meant to be experienced.
Variant lineup (long view)
The first device is clip / pendant + tabletop puck (V1.0, Phase 4). Future variants reuse the same firmware and cloud platform — only enclosure, sensor array, connectivity, and (sometimes) MCU change. Full lineup is in 00_MASTER_PLAN §2.5; summary:
| Variant | When | Delta from V1.0 |
|---|---|---|
| Clip / Pendant | V1.0 (Phase 4) | Baseline |
| Tabletop puck | V1.0 SKU or V1.5 | Larger battery, plugged-in option, far-field tuning |
| Pendant refresh | V1.5 | Industrial-design only |
| Watch companion (Apple/Wear OS) | V1.1+ | Software only, no new HW |
| Card format | V2.x | Slim form, single MEMS, BLE-tethered to phone |
| Screen-equipped | V2.x | Adds e-ink/OLED + firmware UI layer |
| Cellular | V2.x | LTE-M / NB-IoT module, eSIM, no-WiFi flow |
| Local-only | V2.x | Larger MCU/co-processor, on-device STT + embeddings |
Each variant must respect the same HW↔SW contract (§6) — device.kind enum, capability flags, optional cellular/local metadata. We don’t build them now, but the platform is architected to allow them without rework.
2. Why XVF3800 + XIAO ESP32-S3 from Day One
| Reason | Detail |
|---|---|
| Onboard DSP | AEC, beamforming, DoA, VAD, noise suppression — all on the XVF3800. ESP32-S3 doesn’t have to do signal processing. |
| Far-field 5m | Phone mic can’t compete in multi-person rooms. This is the wedge. |
| Buyable today | Seeed sells the dev-kit for $54.50. No sourcing wait. |
| Reference firmware exists | I2S audio, HTTP streaming, BLE provisioning, VAD, DoA — all sample tutorials at the Seeed wiki. |
| Single platform across all versions | Phase 0 dev-kit and Phase 5 retail device share the same MCU and DSP. Firmware investment compounds. |
| Industrial-design freedom | The hardware spec is fixed early; only enclosure changes between phases. |
3. Hardware Architecture (single design, all phases)
┌──────────────────────────────────────────────────────┐│ ARCIVE DEVICE ││ ││ 4× MEMS mics ─► XVF3800 (DSP) ││ │ ││ ├─► I2S audio ─► XIAO ESP32-S3 ││ ├─► I2C control ──┘ ││ ├─► I2C: DoA azimuth queries ││ └─► I2C: VAD state ││ ││ Mute button ─► HARDWARE GATE on I2S clock ││ (mic data physically cannot reach ││ ESP32-S3 when muted) ││ ││ LED ─◄ ESP32-S3 PWM (driven from XVF3800 VAD + ││ firmware mute state, never bypassable) ││ ││ USB-C ─► charging + flashing ││ LiPo 800–1200 mAh ││ ││ ESP32-S3 ─► WiFi (audio upload to Supabase) ││ └► BLE (pairing + control + status) │└──────────────────────────────────────────────────────┘This same architecture ships at every phase. What changes is enclosure, polish, manufacturing scale.
4. Bill of Materials (target)
| Element | Spec | Source |
|---|---|---|
| Mic array | ReSpeaker XVF3800 | Seeed, ~$40 ref board |
| MCU | XIAO ESP32-S3 | Seeed, included in dev-kit |
| Battery | LiPo 800–1200 mAh | Standard, ~$3 |
| Charge IC + protection | TP4056 or equivalent | ~$1 |
| Indicator | 1× soft LED (PWM-driven) | <$0.5 |
| Mute button | Tactile switch wired to I2S clock gate | <$0.5 |
| USB-C connector | Standard | ~$1 |
| Enclosure | Phase 0–3: 3D-printed; Phase 4: SLA/CNC; Phase 5: injection-molded | Variable |
| Audio output | None on device | — |
Target unit cost at MOQ 500 (Phase 4): ~$45 BOM. Retail ~$129–149.
5. Phase Deliverables — Hardware Track
Phase 0 — V0 (Wk 1–3): Dev-kit bring-up
- 5× Seeed XVF3800 + XIAO ESP32-S3 dev-kits assembled (breadboard or reference PCB)
- XVF3800 flashed with I2S firmware (not USB) per Seeed wiki
- ESP32-S3 firmware: I2S capture → 30s WAV chunk → WiFi → POST
/ingest-audio - DoA azimuth read via I2C, attached as JSON metadata
- LED breathing animation during capture
- Hardcoded WiFi creds + dev JWT (no BLE provisioning yet)
- Forks Seeed’s HTTP audio streaming sample for fastest path
- Internal use by team only
Demo: dev-kit on a desk uploads recordings that appear in the same web app feed as phone-mic recordings.
Phase 1 — V0.1 (Wk 4–6): Full firmware feature set
- Same 5–10 dev-kits, no enclosure changes
- VAD gating: silence never uploaded (XVF3800 VAD signal via I2C)
- BLE GATT server implemented: provisioning + control + status (per §6.3)
- Pairing flow end-to-end: app QR → BLE write of WiFi creds + JWT → device joins WiFi
- LED state machine: idle / recording / muted / uploading / error
- OTA scaffolding:
esp_https_ota, daily manifest poll - BLE characteristic for
firmware_version,battery,recording_state - 10 dev-kits in internal use
Phase 2 — V0.2 (Wk 7–10): Enclosure prototype #1
- 3D-printed enclosure housing XVF3800 + XIAO + LiPo + USB-C charge IC + LED + mute button
- Form factor decision: clip-on vs. lanyard vs. tabletop puck — pick one based on user research; build 2–3 variants
- 10 internal units + 5 design-partner units
- Hardware mute button physically gates I2S clock (security-critical; not bypassable in software)
- AEC validated for talk-back (XVF3800 onboard AEC; verified that talk-back via paired phone speaker doesn’t pollute uploaded audio)
- 8-hour battery test passes under realistic VAD duty cycle
- Local 30-min circular buffer for offline resilience
- OTA
devchannel live; firmware version reported via BLE - Memfault (or equivalent) crash reporting integrated
Phase 3 — V0.3 (Wk 11–14): 50-unit pilot
- Refined 3D-printed enclosure based on Phase 2 user feedback
- Industrial-design partner engaged for V1.0 enclosure
- Continuous-streaming firmware mode (group conversation): instead of 30s chunks, opens sustained HTTPS chunked upload (1-second Opus chunks via HTTP/2 keep-alive) to a backend bridge that re-fans into the LiveKit room. The ESP32-S3 only speaks HTTP; the backend handles WebRTC. (Device-side WebRTC was evaluated as a Phase 2 research spike — see §11 risks — but is at the edge of ESP32-S3 capability and is not committed for V0.3.)
- Power profile validated for sustained streaming (target ≥4 hr in this mode)
- BLE notify channels for real-time mute/battery push
- Pairing flow polished (sub-30s end-to-end)
- 50 units shipped to design partners + early adopters
- Telemetry dashboard live: per-device upload volume, battery health, crash rate, DAUs
Phase 4 — V1.0 (Wk 15–22): Production-intent + commercial batch
- Production-intent enclosure (SLA or CNC; injection-molded slated for v1.5)
- 200-unit commercial batch manufactured
- OTA with A/B partitions + signed firmware + automatic rollback on failed boot
- Fleet management telemetry: firmware version distribution, error rate by version, retire-and-replace flow
- Security audit on hardware mute — re-validated as software-undefeatable
- LED VAD-driven indicator audit — cannot be turned off while mic is hot
- Retail packaging design begins
- App Store listing live: device as $129 upsell
Phase 5 — V1.1+ (Wk 23+): Retail-grade
- Industrial-design refresh based on V1.0 field feedback
- Optional SKU with e-ink or OLED screen — the original “screen on device” idea, now informed by real user data on what they want to see
- Replaceable or longer-runtime battery option
- Injection-molded enclosure
- FCC / CE certification (required for retail at scale)
- Variants: clip / pendant / tabletop puck — based on V1.0 user preference data
- Distribution partnerships (DTC, Amazon, possibly Best Buy)
6. Hardware ↔ Software Contract
Mirrors 00_MASTER_PLAN.md §6. The hardware-side responsibilities:
6.1 Audio Upload
- Endpoint:
POST https://<project>.supabase.co/functions/v1/ingest-audio - Auth:
Authorization: Bearer <device_jwt>+X-Device-Id: <uuid> - Body: VAD-trimmed audio chunk, ≤ 30s, WAV (Phase 0) → Opus (Phase 1+)
- Query:
recorded_at(ISO 8601),doa_json(urlencoded compact JSON) - Retry: exponential backoff, up to ~30 min local buffer
6.2 Group-mode streaming (Phase 3+)
- Device opens sustained HTTPS chunked upload (1-second Opus chunks via HTTP/2 keep-alive) to
/functions/v1/group-streamwith a session token issued by/functions/v1/start-group-session - Backend bridge receives chunks and publishes them as a participant track into the LiveKit room — ESP32-S3 itself does not speak WebRTC
- AI role interjections (TTS) are played via the paired phone, not on the device (device has no speaker)
- Phase 2 research spike: evaluate
esp-webrtcfor direct device → LiveKit. If viable, swap transport in Phase 4 without API changes.
6.3 BLE GATT Services
| Service | Characteristic | Direction | Purpose |
|---|---|---|---|
ARCIVE_PROV | wifi_creds | App → HW (write) | SSID + password JSON |
ARCIVE_PROV | device_jwt | App → HW (write) | Supabase upload token |
ARCIVE_PROV | mac | HW → App (read+notify) | Device MAC |
ARCIVE_CTRL | mute | App ↔ HW (read+notify; HW write only) | Hardware-driven mute state |
ARCIVE_STATUS | battery | HW → App (read+notify) | 0–100 |
ARCIVE_STATUS | recording_state | HW → App (notify) | idle/recording/uploading/error |
ARCIVE_STATUS | firmware_version | HW → App (read) | Semver string |
UUIDs generated once, stored in shared/ble-uuids.h (firmware) and packages/shared/ble-uuids.ts (software). CI script enforces sync.
6.4 Pairing flow
- App generates QR with
{ pairing_url, pairing_token, supabase_url } - User scans → app initiates BLE connection to advertised
ARCIVE_PROVservice - App writes WiFi creds + device JWT
- HW reads back its MAC, writes to
maccharacteristic; app callsPOST /devices - HW reboots, joins WiFi, starts heartbeat (uploads 0-byte status ping)
- App shows “Device paired ✓“
6.5 OTA firmware update
- Firmware binaries uploaded to Supabase Storage at
firmware/<channel>/<version>.bin - HW polls JSON manifest at
firmware/<channel>/latest.jsondaily - Newer version available → download via
esp_https_ota, verify signature, reboot - Channels:
dev(Phase 1+),beta(Phase 3+),stable(Phase 4+) - A/B partitions + automatic rollback on boot failure (Phase 4+)
7. Privacy & Consent — Hardware Requirements
Non-negotiable, all phases:
| Requirement | Implementation |
|---|---|
| LED visibly on when mic is hot | PWM-driven from firmware AND tied to XVF3800 VAD signal; cannot be turned off in software while mic is unmuted |
| Hardware mute cuts mic at hardware level | Mute button physically gates the I2S clock to the mic array; not just a software flag |
| Mute state cannot be overridden remotely | App can only read mute state, never set it to unmute |
| Optional audible chime on session start | Firmware setting; off by default but available for two-party-consent jurisdictions |
| On-device data encrypted at rest | ESP32-S3 NVS encryption enabled |
| Device JWT can be revoked | App sends revoke over BLE → firmware wipes WiFi creds + JWT, factory-resets |
These rules make ARCIVE defensibly respectful of bystanders in a way Limitless / Friend / Plaud have struggled with — and they’re a real selling point for Caregiving / Therapy B2B.
8. Firmware Repository Layout
firmware/├── platformio.ini├── shared/│ └── ble-uuids.h # ← mirror of packages/shared/ble-uuids.ts├── src/│ ├── main.cpp # Entry, state machine, OTA check│ ├── config.h # Endpoints, buffer sizes, timeouts│ ├── audio/│ │ ├── i2s_capture.cpp # XVF3800 → ESP32-S3 I2S│ │ ├── opus_encoder.cpp # 30s Opus chunks (Phase 1+)│ │ └── chunker.cpp # Chunk sealing, metadata stamping│ ├── stream/ # Phase 3+│ │ └── group_stream.cpp # Continuous HTTPS chunked upload (1s Opus chunks via HTTP/2 keep-alive) to backend bridge — device does NOT speak WebRTC│ ├── upload/│ │ ├── wifi_manager.cpp│ │ ├── https_client.cpp│ │ └── circular_buffer.cpp # Local store-and-forward│ ├── ble/│ │ ├── gatt_server.cpp # ARCIVE_PROV / ARCIVE_CTRL / ARCIVE_STATUS│ │ ├── provisioning.cpp│ │ └── status_notifier.cpp│ ├── led/│ │ ├── breathing.cpp│ │ └── states.cpp│ ├── doa/│ │ └── azimuth_reader.cpp│ ├── vad/│ │ └── vad_gate.cpp│ ├── ota/│ │ └── update_check.cpp│ └── telemetry/│ └── crash_reporter.cpp└── tests/ ├── audio/ ├── ble/ └── upload/PlatformIO baseline
[env:xiao_esp32s3]platform = espressif32board = seeed_xiao_esp32s3framework = arduinolib_deps = ESP Async WebServer ArduinoJson NimBLE-Arduino https://github.com/respeaker/xvf3800-arduino-driver ; or vendor SDKbuild_flags = -DARCIVE_FW_VERSION=\"0.1.0\" -DBLE_UUID_HEADER=\"shared/ble-uuids.h\"9. Hardware Test Matrix
| Test | P0 dev-kit | P1 dev-kit | P2 proto | P3 pilot | P4 commercial |
|---|---|---|---|---|---|
| Boots cleanly from cold | ✅ | ✅ | ✅ | ✅ | ✅ |
| Pairs via BLE in <30s | — | ✅ | ✅ | ✅ | ✅ |
| Joins WiFi after reboot | ✅ | ✅ | ✅ | ✅ | ✅ |
| Survives WiFi outage 30 min | basic | ✅ | ✅ | ✅ | ✅ |
| Mute button cuts I2S at hardware level | — | wired | ✅ | ✅ | ✅ |
| LED matches actual mic state | ✅ | ✅ | ✅ | ✅ | ✅ |
| 8-hour battery test | n/a | n/a | ✅ | ✅ | ✅ |
| Upload retries succeed after disconnect | basic | ✅ | ✅ | ✅ | ✅ |
| OTA update completes successfully | — | dev channel | ✅ | ✅ | ✅ A/B rollback |
| Crash report reaches Memfault | — | — | ✅ | ✅ | ✅ |
| Factory reset wipes all creds | — | ✅ | ✅ | ✅ | ✅ |
| Multi-speaker capture quality | ✅ raw | ✅ | ✅ | ✅ | ✅ |
| DoA azimuth attached as metadata | ✅ raw | ✅ | ✅ | ✅ | ✅ |
| DoA accuracy ±15° (validated against ground truth) | — | — | ✅ | ✅ | ✅ |
| AEC functional during talk-back | — | — | ✅ | ✅ | ✅ |
| Group-mode sustained stream ≥4 hr | — | — | — | ✅ | ✅ |
10. What Hardware Does NOT Do (Permanently)
- No on-device transcription — too heavy for ESP32-S3, kills battery, redundant with cloud
- No custom wake-word — adds complexity; doesn’t match always-on capture model
- No BLE audio streaming — bandwidth ceiling; WiFi only for audio
- No on-device speaker identification — voice embeddings happen server-side
- No on-device speaker — talk-back uses paired phone for playback (until product feedback says otherwise)
- No camera — out of scope; privacy-incompatible
11. Risks & Mitigations
| Risk | Mitigation |
|---|---|
| Industrial design takes longer than expected | 3D-printed enclosures used through Phase 3; production-intent only at Phase 4 — buys 14 weeks of design lead time |
| Custom PCB defects | Phase 0–3 ride on Seeed reference board; only Phase 4 introduces our own PCB (and minimally — same components, different layout) |
| Battery life shorter than 8 hr in field | Tested with realistic VAD duty cycle from Phase 2 onward |
| FCC/CE certification delays | Skipped for Phase 0–4 (research/dev units); required only for Phase 5 retail |
| Firmware bricking devices | OTA includes A/B partitions, rollback on failed boot (Phase 4+) |
| Mute button defeated in software | Hardware gate on I2S clock makes this physically impossible |
| Group-mode streaming drains battery too fast | Power profile measured Phase 2; decision in Phase 3 whether to ship group mode as “tabletop, plugged in” only |
| User doesn’t see why they need the device when phone works | Continuously demo the quality difference (multi-speaker, far-field) and the behavioral difference (no screen, no doom-scroll) at every user touchpoint |
| Device-side WebRTC at edge of ESP32-S3 capability | V0.3 ships HTTPS chunked upload + backend LiveKit bridge (proven path). esp-webrtc evaluated as a Phase 2 research spike; if viable, swap transport in Phase 4 without changing API contract. If not viable, backend bridge remains permanent — group-mode quality is unaffected, only data path changes. |
12. Phase-Boundary Joint Demos
Each phase ends with a demo that proves both tracks integrate. These are non-negotiable ship gates.
| Phase | Joint demo |
|---|---|
| 0 | Dev-kit on desk uploads recording → web app feed shows it next to phone-mic recordings |
| 1 | Pair dev-kit via QR scan → device joins WiFi → captures multi-speaker meeting → diarization labels appear → same speaker recognized across two sessions |
| 2 | User wears 3D-printed proto for full work day → offline periods buffered locally → OTA update lands without user action → talk-back via paired phone using device as input |
| 3 | Pilot device on conference table during 4-person meeting → group mode active → speakers identified by name from session 2 → AI role interjects when invoked |
| 4 | Retail experience: buy device → scan QR → paired in 30s → capture & review via marketplace role → plug into Claude Desktop via MCP for cross-tool agent access |
| 5 | Pick up retail-packaged device from box → certified, injection-molded, optional screen variant → identical software experience as V1.0 |