Skip to content

ARCIVE — Hardware Plan

Companion to 00_MASTER_PLAN.md and 01_SOFTWARE_PLAN.md. Hardware is a co-equal track to software, developed in parallel from week 1. The device is core product identity, not a peripheral.


1. Strategic Stance

  • ARCIVE is a hardware & software platform (per the canonical product definition in 00_MASTER_PLAN §1). Hardware is not “a device” — it’s a family of form factors sharing one firmware platform, one cloud contract, and one user experience.
  • The first ARCIVE device is the product’s physical identity. Not a phone app trying to be calm — a dedicated, screenless, distraction-free capture surface that lives with the user.
  • Hardware ships at every phase boundary, integrated continuously with software via the contract in 00_MASTER_PLAN §6.
  • Form evolves: dev-kit (Phase 0) → 3D-printed proto (Phase 2) → pilot enclosure (Phase 3) → production-intent (Phase 4) → injection-molded retail (Phase 5) → variants (V1.5+).
  • No white-label detour. We commit to the XVF3800 + XIAO ESP32-S3 platform from day one because: (a) it’s the only path to a true mic-array differentiator, (b) the dev-kit is buyable today for $54, (c) iterating on enclosure is cheaper than re-platforming firmware later.
  • The app works without any device, but the device family is the way ARCIVE is meant to be experienced.

Variant lineup (long view)

The first device is clip / pendant + tabletop puck (V1.0, Phase 4). Future variants reuse the same firmware and cloud platform — only enclosure, sensor array, connectivity, and (sometimes) MCU change. Full lineup is in 00_MASTER_PLAN §2.5; summary:

VariantWhenDelta from V1.0
Clip / PendantV1.0 (Phase 4)Baseline
Tabletop puckV1.0 SKU or V1.5Larger battery, plugged-in option, far-field tuning
Pendant refreshV1.5Industrial-design only
Watch companion (Apple/Wear OS)V1.1+Software only, no new HW
Card formatV2.xSlim form, single MEMS, BLE-tethered to phone
Screen-equippedV2.xAdds e-ink/OLED + firmware UI layer
CellularV2.xLTE-M / NB-IoT module, eSIM, no-WiFi flow
Local-onlyV2.xLarger MCU/co-processor, on-device STT + embeddings

Each variant must respect the same HW↔SW contract (§6) — device.kind enum, capability flags, optional cellular/local metadata. We don’t build them now, but the platform is architected to allow them without rework.


2. Why XVF3800 + XIAO ESP32-S3 from Day One

ReasonDetail
Onboard DSPAEC, beamforming, DoA, VAD, noise suppression — all on the XVF3800. ESP32-S3 doesn’t have to do signal processing.
Far-field 5mPhone mic can’t compete in multi-person rooms. This is the wedge.
Buyable todaySeeed sells the dev-kit for $54.50. No sourcing wait.
Reference firmware existsI2S audio, HTTP streaming, BLE provisioning, VAD, DoA — all sample tutorials at the Seeed wiki.
Single platform across all versionsPhase 0 dev-kit and Phase 5 retail device share the same MCU and DSP. Firmware investment compounds.
Industrial-design freedomThe hardware spec is fixed early; only enclosure changes between phases.

3. Hardware Architecture (single design, all phases)

┌──────────────────────────────────────────────────────┐
│ ARCIVE DEVICE │
│ │
│ 4× MEMS mics ─► XVF3800 (DSP) │
│ │ │
│ ├─► I2S audio ─► XIAO ESP32-S3 │
│ ├─► I2C control ──┘ │
│ ├─► I2C: DoA azimuth queries │
│ └─► I2C: VAD state │
│ │
│ Mute button ─► HARDWARE GATE on I2S clock │
│ (mic data physically cannot reach │
│ ESP32-S3 when muted) │
│ │
│ LED ─◄ ESP32-S3 PWM (driven from XVF3800 VAD + │
│ firmware mute state, never bypassable) │
│ │
│ USB-C ─► charging + flashing │
│ LiPo 800–1200 mAh │
│ │
│ ESP32-S3 ─► WiFi (audio upload to Supabase) │
│ └► BLE (pairing + control + status) │
└──────────────────────────────────────────────────────┘

This same architecture ships at every phase. What changes is enclosure, polish, manufacturing scale.


4. Bill of Materials (target)

ElementSpecSource
Mic arrayReSpeaker XVF3800Seeed, ~$40 ref board
MCUXIAO ESP32-S3Seeed, included in dev-kit
BatteryLiPo 800–1200 mAhStandard, ~$3
Charge IC + protectionTP4056 or equivalent~$1
Indicator1× soft LED (PWM-driven)<$0.5
Mute buttonTactile switch wired to I2S clock gate<$0.5
USB-C connectorStandard~$1
EnclosurePhase 0–3: 3D-printed; Phase 4: SLA/CNC; Phase 5: injection-moldedVariable
Audio outputNone on device

Target unit cost at MOQ 500 (Phase 4): ~$45 BOM. Retail ~$129–149.


5. Phase Deliverables — Hardware Track

Phase 0 — V0 (Wk 1–3): Dev-kit bring-up

  • 5× Seeed XVF3800 + XIAO ESP32-S3 dev-kits assembled (breadboard or reference PCB)
  • XVF3800 flashed with I2S firmware (not USB) per Seeed wiki
  • ESP32-S3 firmware: I2S capture → 30s WAV chunk → WiFi → POST /ingest-audio
  • DoA azimuth read via I2C, attached as JSON metadata
  • LED breathing animation during capture
  • Hardcoded WiFi creds + dev JWT (no BLE provisioning yet)
  • Forks Seeed’s HTTP audio streaming sample for fastest path
  • Internal use by team only

Demo: dev-kit on a desk uploads recordings that appear in the same web app feed as phone-mic recordings.

Phase 1 — V0.1 (Wk 4–6): Full firmware feature set

  • Same 5–10 dev-kits, no enclosure changes
  • VAD gating: silence never uploaded (XVF3800 VAD signal via I2C)
  • BLE GATT server implemented: provisioning + control + status (per §6.3)
  • Pairing flow end-to-end: app QR → BLE write of WiFi creds + JWT → device joins WiFi
  • LED state machine: idle / recording / muted / uploading / error
  • OTA scaffolding: esp_https_ota, daily manifest poll
  • BLE characteristic for firmware_version, battery, recording_state
  • 10 dev-kits in internal use

Phase 2 — V0.2 (Wk 7–10): Enclosure prototype #1

  • 3D-printed enclosure housing XVF3800 + XIAO + LiPo + USB-C charge IC + LED + mute button
  • Form factor decision: clip-on vs. lanyard vs. tabletop puck — pick one based on user research; build 2–3 variants
  • 10 internal units + 5 design-partner units
  • Hardware mute button physically gates I2S clock (security-critical; not bypassable in software)
  • AEC validated for talk-back (XVF3800 onboard AEC; verified that talk-back via paired phone speaker doesn’t pollute uploaded audio)
  • 8-hour battery test passes under realistic VAD duty cycle
  • Local 30-min circular buffer for offline resilience
  • OTA dev channel live; firmware version reported via BLE
  • Memfault (or equivalent) crash reporting integrated

Phase 3 — V0.3 (Wk 11–14): 50-unit pilot

  • Refined 3D-printed enclosure based on Phase 2 user feedback
  • Industrial-design partner engaged for V1.0 enclosure
  • Continuous-streaming firmware mode (group conversation): instead of 30s chunks, opens sustained HTTPS chunked upload (1-second Opus chunks via HTTP/2 keep-alive) to a backend bridge that re-fans into the LiveKit room. The ESP32-S3 only speaks HTTP; the backend handles WebRTC. (Device-side WebRTC was evaluated as a Phase 2 research spike — see §11 risks — but is at the edge of ESP32-S3 capability and is not committed for V0.3.)
  • Power profile validated for sustained streaming (target ≥4 hr in this mode)
  • BLE notify channels for real-time mute/battery push
  • Pairing flow polished (sub-30s end-to-end)
  • 50 units shipped to design partners + early adopters
  • Telemetry dashboard live: per-device upload volume, battery health, crash rate, DAUs

Phase 4 — V1.0 (Wk 15–22): Production-intent + commercial batch

  • Production-intent enclosure (SLA or CNC; injection-molded slated for v1.5)
  • 200-unit commercial batch manufactured
  • OTA with A/B partitions + signed firmware + automatic rollback on failed boot
  • Fleet management telemetry: firmware version distribution, error rate by version, retire-and-replace flow
  • Security audit on hardware mute — re-validated as software-undefeatable
  • LED VAD-driven indicator audit — cannot be turned off while mic is hot
  • Retail packaging design begins
  • App Store listing live: device as $129 upsell

Phase 5 — V1.1+ (Wk 23+): Retail-grade

  • Industrial-design refresh based on V1.0 field feedback
  • Optional SKU with e-ink or OLED screen — the original “screen on device” idea, now informed by real user data on what they want to see
  • Replaceable or longer-runtime battery option
  • Injection-molded enclosure
  • FCC / CE certification (required for retail at scale)
  • Variants: clip / pendant / tabletop puck — based on V1.0 user preference data
  • Distribution partnerships (DTC, Amazon, possibly Best Buy)

6. Hardware ↔ Software Contract

Mirrors 00_MASTER_PLAN.md §6. The hardware-side responsibilities:

6.1 Audio Upload

  • Endpoint: POST https://<project>.supabase.co/functions/v1/ingest-audio
  • Auth: Authorization: Bearer <device_jwt> + X-Device-Id: <uuid>
  • Body: VAD-trimmed audio chunk, ≤ 30s, WAV (Phase 0) → Opus (Phase 1+)
  • Query: recorded_at (ISO 8601), doa_json (urlencoded compact JSON)
  • Retry: exponential backoff, up to ~30 min local buffer

6.2 Group-mode streaming (Phase 3+)

  • Device opens sustained HTTPS chunked upload (1-second Opus chunks via HTTP/2 keep-alive) to /functions/v1/group-stream with a session token issued by /functions/v1/start-group-session
  • Backend bridge receives chunks and publishes them as a participant track into the LiveKit room — ESP32-S3 itself does not speak WebRTC
  • AI role interjections (TTS) are played via the paired phone, not on the device (device has no speaker)
  • Phase 2 research spike: evaluate esp-webrtc for direct device → LiveKit. If viable, swap transport in Phase 4 without API changes.

6.3 BLE GATT Services

ServiceCharacteristicDirectionPurpose
ARCIVE_PROVwifi_credsApp → HW (write)SSID + password JSON
ARCIVE_PROVdevice_jwtApp → HW (write)Supabase upload token
ARCIVE_PROVmacHW → App (read+notify)Device MAC
ARCIVE_CTRLmuteApp ↔ HW (read+notify; HW write only)Hardware-driven mute state
ARCIVE_STATUSbatteryHW → App (read+notify)0–100
ARCIVE_STATUSrecording_stateHW → App (notify)idle/recording/uploading/error
ARCIVE_STATUSfirmware_versionHW → App (read)Semver string

UUIDs generated once, stored in shared/ble-uuids.h (firmware) and packages/shared/ble-uuids.ts (software). CI script enforces sync.

6.4 Pairing flow

  1. App generates QR with { pairing_url, pairing_token, supabase_url }
  2. User scans → app initiates BLE connection to advertised ARCIVE_PROV service
  3. App writes WiFi creds + device JWT
  4. HW reads back its MAC, writes to mac characteristic; app calls POST /devices
  5. HW reboots, joins WiFi, starts heartbeat (uploads 0-byte status ping)
  6. App shows “Device paired ✓“

6.5 OTA firmware update

  • Firmware binaries uploaded to Supabase Storage at firmware/<channel>/<version>.bin
  • HW polls JSON manifest at firmware/<channel>/latest.json daily
  • Newer version available → download via esp_https_ota, verify signature, reboot
  • Channels: dev (Phase 1+), beta (Phase 3+), stable (Phase 4+)
  • A/B partitions + automatic rollback on boot failure (Phase 4+)

Non-negotiable, all phases:

RequirementImplementation
LED visibly on when mic is hotPWM-driven from firmware AND tied to XVF3800 VAD signal; cannot be turned off in software while mic is unmuted
Hardware mute cuts mic at hardware levelMute button physically gates the I2S clock to the mic array; not just a software flag
Mute state cannot be overridden remotelyApp can only read mute state, never set it to unmute
Optional audible chime on session startFirmware setting; off by default but available for two-party-consent jurisdictions
On-device data encrypted at restESP32-S3 NVS encryption enabled
Device JWT can be revokedApp sends revoke over BLE → firmware wipes WiFi creds + JWT, factory-resets

These rules make ARCIVE defensibly respectful of bystanders in a way Limitless / Friend / Plaud have struggled with — and they’re a real selling point for Caregiving / Therapy B2B.


8. Firmware Repository Layout

firmware/
├── platformio.ini
├── shared/
│ └── ble-uuids.h # ← mirror of packages/shared/ble-uuids.ts
├── src/
│ ├── main.cpp # Entry, state machine, OTA check
│ ├── config.h # Endpoints, buffer sizes, timeouts
│ ├── audio/
│ │ ├── i2s_capture.cpp # XVF3800 → ESP32-S3 I2S
│ │ ├── opus_encoder.cpp # 30s Opus chunks (Phase 1+)
│ │ └── chunker.cpp # Chunk sealing, metadata stamping
│ ├── stream/ # Phase 3+
│ │ └── group_stream.cpp # Continuous HTTPS chunked upload (1s Opus chunks via HTTP/2 keep-alive) to backend bridge — device does NOT speak WebRTC
│ ├── upload/
│ │ ├── wifi_manager.cpp
│ │ ├── https_client.cpp
│ │ └── circular_buffer.cpp # Local store-and-forward
│ ├── ble/
│ │ ├── gatt_server.cpp # ARCIVE_PROV / ARCIVE_CTRL / ARCIVE_STATUS
│ │ ├── provisioning.cpp
│ │ └── status_notifier.cpp
│ ├── led/
│ │ ├── breathing.cpp
│ │ └── states.cpp
│ ├── doa/
│ │ └── azimuth_reader.cpp
│ ├── vad/
│ │ └── vad_gate.cpp
│ ├── ota/
│ │ └── update_check.cpp
│ └── telemetry/
│ └── crash_reporter.cpp
└── tests/
├── audio/
├── ble/
└── upload/

PlatformIO baseline

[env:xiao_esp32s3]
platform = espressif32
board = seeed_xiao_esp32s3
framework = arduino
lib_deps =
ESP Async WebServer
ArduinoJson
NimBLE-Arduino
https://github.com/respeaker/xvf3800-arduino-driver ; or vendor SDK
build_flags =
-DARCIVE_FW_VERSION=\"0.1.0\"
-DBLE_UUID_HEADER=\"shared/ble-uuids.h\"

9. Hardware Test Matrix

TestP0 dev-kitP1 dev-kitP2 protoP3 pilotP4 commercial
Boots cleanly from cold
Pairs via BLE in <30s
Joins WiFi after reboot
Survives WiFi outage 30 minbasic
Mute button cuts I2S at hardware levelwired
LED matches actual mic state
8-hour battery testn/an/a
Upload retries succeed after disconnectbasic
OTA update completes successfullydev channel✅ A/B rollback
Crash report reaches Memfault
Factory reset wipes all creds
Multi-speaker capture quality✅ raw
DoA azimuth attached as metadata✅ raw
DoA accuracy ±15° (validated against ground truth)
AEC functional during talk-back
Group-mode sustained stream ≥4 hr

10. What Hardware Does NOT Do (Permanently)

  • No on-device transcription — too heavy for ESP32-S3, kills battery, redundant with cloud
  • No custom wake-word — adds complexity; doesn’t match always-on capture model
  • No BLE audio streaming — bandwidth ceiling; WiFi only for audio
  • No on-device speaker identification — voice embeddings happen server-side
  • No on-device speaker — talk-back uses paired phone for playback (until product feedback says otherwise)
  • No camera — out of scope; privacy-incompatible

11. Risks & Mitigations

RiskMitigation
Industrial design takes longer than expected3D-printed enclosures used through Phase 3; production-intent only at Phase 4 — buys 14 weeks of design lead time
Custom PCB defectsPhase 0–3 ride on Seeed reference board; only Phase 4 introduces our own PCB (and minimally — same components, different layout)
Battery life shorter than 8 hr in fieldTested with realistic VAD duty cycle from Phase 2 onward
FCC/CE certification delaysSkipped for Phase 0–4 (research/dev units); required only for Phase 5 retail
Firmware bricking devicesOTA includes A/B partitions, rollback on failed boot (Phase 4+)
Mute button defeated in softwareHardware gate on I2S clock makes this physically impossible
Group-mode streaming drains battery too fastPower profile measured Phase 2; decision in Phase 3 whether to ship group mode as “tabletop, plugged in” only
User doesn’t see why they need the device when phone worksContinuously demo the quality difference (multi-speaker, far-field) and the behavioral difference (no screen, no doom-scroll) at every user touchpoint
Device-side WebRTC at edge of ESP32-S3 capabilityV0.3 ships HTTPS chunked upload + backend LiveKit bridge (proven path). esp-webrtc evaluated as a Phase 2 research spike; if viable, swap transport in Phase 4 without changing API contract. If not viable, backend bridge remains permanent — group-mode quality is unaffected, only data path changes.

12. Phase-Boundary Joint Demos

Each phase ends with a demo that proves both tracks integrate. These are non-negotiable ship gates.

PhaseJoint demo
0Dev-kit on desk uploads recording → web app feed shows it next to phone-mic recordings
1Pair dev-kit via QR scan → device joins WiFi → captures multi-speaker meeting → diarization labels appear → same speaker recognized across two sessions
2User wears 3D-printed proto for full work day → offline periods buffered locally → OTA update lands without user action → talk-back via paired phone using device as input
3Pilot device on conference table during 4-person meeting → group mode active → speakers identified by name from session 2 → AI role interjects when invoked
4Retail experience: buy device → scan QR → paired in 30s → capture & review via marketplace role → plug into Claude Desktop via MCP for cross-tool agent access
5Pick up retail-packaged device from box → certified, injection-molded, optional screen variant → identical software experience as V1.0