- Status: Accepted
- Date: 2026-05-07
- Deciders: Sujith
- Affected: every layer that incurs per-user variable cost —
supabase/functions/transcribe-step/,supabase/functions/summarize-step/,supabase/functions/embed-step/,supabase/functions/diarize-step/,supabase/functions/reid-step/,supabase/functions/compute-edges-step/,supabase/functions/link-topics-step/,supabase/functions/backfill-summaries/,supabase/functions/pipeline-tick/,apps/web/app/api/chat/,apps/web/lib/,supabase/migrations/, futureapps/web/app/(admin)/economics/, futureapps/web/app/(app)/account/usage/
Context
Three things forced this question on 2026-05-07:
- Pricing decisions are blocked on a number we don’t have. ADR-0011 fixed the vendor architecture but said nothing about per-user economics. The README is stakeholder-facing (README.md) and the natural follow-up question — “what does an active ARCIVE user cost us?” — is unanswerable today. Every plan-tier conversation, free-tier limit, and runway projection depends on it.
- Token-only telemetry is the wrong shape. The obvious first instinct (“log Anthropic/Gemini/Voyage
usageper call”) covers maybe a third of marginal cost. The other two thirds are compute (Supabase Edge Function invocations, future Modal GPU-mins per ADR-0010 resumption), storage rent (raw audio in Supabase Storage, pgvector HNSW pages, postgres rows), and auxiliaries (Resend, Sentry, PostHog). A user who records heavily for six months then goes quiet still costs money indefinitely; a token-only meter shows them as free. - Today the only thing tracked is recording seconds.
20260503000005_usage_helpers.sqlimplementsconsume_recording_secondswriting touser_profiles.monthly_seconds_used. That’s a billing cap, not a cost view. The AI vendor calls insummarize-step/index.ts,embed-step/index.ts,transcribe-step/index.ts,diarize-step/index.ts,reid-step/index.ts, andapps/web/app/api/chat/route.tsall read the responseusage/usageMetadatafields and discard them.
The pre-existing constraints that scoped the answer:
- ADR-0011 — AI vendor strategy: the canonical layer model. Cost telemetry must compose with the layer model, not invent a parallel taxonomy.
- ADR-0001 — pgmq + pg_cron: each pipeline step already has a
job_idand amemory_id— variable-cost attribution is mostly free at the join layer. - ADR-0007 — Consent gate on agent retrieval: cost telemetry must never log content (prompts, transcripts, summaries) — only counts and identifiers.
- Existing
dev_pass+ tier-cap mechanism (ADR-0005,20260503000011_dev_pass.sql): the new telemetry must not double-meter or interfere with the existing seconds cap.
Options considered
Option A — Token-only ai_usage_events table
One table, one row per AI vendor call, with usage from each provider’s response captured. Dashboards roll up by user / vendor / layer.
- Pros: ~150 lines of code; covers Anthropic, Gemini, Voyage, Groq cleanly; easy to wire in a week; matches what most YC-stage teams ship first.
- Cons:
- Misses 60–70% of variable cost. Edge Function invocations, Modal GPU-mins (when voice resumes), Supabase Storage GB-months for raw audio, pgvector HNSW page footprint, Resend/Sentry/PostHog event-volume — none of these are tokens, all are real lines on the bill.
- Storage rent is invisible. Per ADR-0010 voice deferral, audio storage is the dominant fixed-per-user cost ARCIVE will accumulate before voice ever ships. A token-only meter shows a heavy historical user as cheap.
- Forces a second migration in three months when the gap becomes obvious. Worse: the second migration will overlap and partially supersede the first, leaving a confusing historical schema.
Option B — Rely on vendor billing portals; build no internal telemetry
Use Anthropic / Gemini / Supabase / Modal / Resend admin dashboards directly. Pull monthly totals into a spreadsheet. Set tier caps based on aggregate cost / MAU.
- Pros: zero implementation cost; vendor dashboards already exist and are accurate.
- Cons:
- No per-user attribution at all. Every vendor reports per-API-key, not per-end-user. You can know “we spent $X on Anthropic last month” but not “Alice cost $0.79, Bob cost $14.20.”
- Therefore can’t answer the actual business questions — which users are unprofitable on the free tier? what’s the right per-tier limit? is the chat-heavy power-user costing more than they pay?
- Therefore can’t surface a user-facing
/account/usagepage either, because the data doesn’t exist at user grain. - Therefore can’t price transparently — only aggregate-fairly.
Option C — Unified usage_events + daily usage_storage_snapshots + versioned cost_prices + monthly cost_overhead_monthly (chosen)
One event stream for variable cost (AI calls, compute jobs, emails, analytics events, payment events) and one daily snapshot for storage rent (the stock problem that doesn’t fit an event stream). A versioned price book turns counts into dollars at write time. A small overhead table allocates fixed costs into a fully-loaded view. Two SQL views — variable-only and fully-loaded — feed an internal /admin/economics page first and a user-facing /account/usage page later.
- Pros:
- Covers all five cost buckets (AI, compute, storage, auxiliaries, allocated fixed) under one schema. No second migration in three months.
- Storage rent is first-class. Daily snapshot answers “the heavy-historical user with no recent activity” question that token-only telemetry can’t.
- Versioned prices. When Anthropic drops Haiku 5× (ADR-0011’s explicit example), retroactive cost views stay honest — old rows stay priced at the old rate, new rows at the new rate. No destructive price update.
- Layer-aware. Each event row carries the ADR-0011 layer number, so the dashboards roll up along the same taxonomy already in use everywhere else.
- Two attribution lenses, side-by-side. The variable-only view answers “are we losing money on the margin?”; the fully-loaded view answers “is the business profitable at this scale?” The gap between them is the fixed-cost dilution curve that tells you when each plan tier breaks even.
- Privacy-clean by construction. Schema is counts and IDs; no
prompt,transcript,summary, orbodycolumn. Aligns with ADR-0007’s “minimum information sufficient for the function” framing. - Composes with existing seconds cap.
consume_recording_secondsstays — it’s a cap, not a cost view; both are needed and they don’t overlap.
- Cons:
- One non-trivial migration; one shared
logUsage()helper to add and call from ~10 call sites; one nightly cron for storage snapshots. cost_overhead_monthlyhas to be entered by hand monthly until vendor admin APIs are wired (deferred). Acceptable: ~5 minutes/month for 6–8 line items.- Dashboard work isn’t free (deferred to a follow-up — this ADR commits to the data shape, not the UI).
- One non-trivial migration; one shared
Decision
Adopt Option C — unified usage_events + daily usage_storage_snapshots + versioned cost_prices + monthly cost_overhead_monthly, with two views (variable / fully-loaded) feeding an internal /admin/economics page first and a user-facing /account/usage page after V0.2 ships.
Cost telemetry is layered on top of the existing pipeline, never inside it. No call site has its primary logic changed; each gains a single logUsage() line near the response handler. The schema is the immutable commitment of this ADR; the UIs and the user-facing pricing copy are the visible follow-up work.
Two framings that anchor the rest
- Audio minutes are the primary meter. They drive transcribe + summarize + embed + (eventually) voice. Every adjacent product (Otter, Granola, Limitless, Plaud, Bee) meters minutes for the same reason — minutes correlate with cost. ARCIVE already tracks minutes via
consume_recording_seconds. Chat tokens are the secondary meter and behave differently — see “Chat is a separate axis” below. - Storage is rent, expressed as a history horizon — not a GB cap. Free tier keeps N days of audio history; Pro keeps everything. “History horizon” is a story users grasp; “5 GB cap” is a story they don’t. It also gives ARCIVE a clean reclamation path for dormant free-tier users without silently deleting their data — old recordings tier to cold-storage or are pruned per the horizon, with the user’s consent baked into the plan.
What ships now (V0.2 — bundled in one feature branch)
These are small, reinforcing changes. None are speculative.
-
Schema migration — four tables, all new. No changes to existing tables.
usage_events— per-call variable cost. One row per AI call, compute job, email send, analytics event, or payment event. Columns:id,user_id(nullable for system jobs),space_id,memory_id,job_id,layer int(per ADR-0011),event_type text('ai_call' | 'compute_job' | 'email' | 'analytics' | 'payment'),vendor text,service text,sku text(model id, function name, plan),qty_in numeric,qty_out numeric,unit text('tokens' | 'audio_sec' | 'gb_sec' | 'invocations' | 'emails' | 'events' | 'usd_charged'),cost_usd numeric(12,6),latency_ms int,status text('ok' | 'fallback' | 'error'),attempt int(which provider in the fallback chain succeeded),created_at timestamptz. RLS:service_rolewrites; users select their own rows for the/account/usagesurface. Indexes on(user_id, created_at),(layer, created_at),(vendor, created_at). Partition by month is deferred until row count justifies it.usage_storage_snapshots— daily stock per user. Columns:day date,user_id uuid,audio_bytes bigint,embedding_rows int,embedding_bytes_est bigint,memory_rows int,topic_rows int,edge_rows int,egress_bytes_today bigint,storage_cost_usd numeric. PK(day, user_id). ~365 rows/user/year — trivially small.cost_prices— versioned price book. Columns:vendor text,sku text,unit text,usd_per_unit numeric(12,8),effective_at timestamptz,expires_at timestamptz. Lookup is “find the row wherenow() between effective_at and coalesce(expires_at, 'infinity').” Vendor price changes are a new row, never a destructive update.cost_overhead_monthly— fixed-cost allocation. Columns:month date,vendor text,amount_usd numeric,allocation_rule text('equal_per_mau' | 'weighted_by_variable' | 'unallocated'),note text. Manually entered each month against the actual vendor bill until vendor admin APIs are wired (deferred).
-
Shared
logUsage()helper insupabase/functions/_shared/log-usage.ts(Edge Functions side) andapps/web/lib/log-usage.ts(Next.js side). Single function:logUsage({ user_id, layer, event_type, vendor, service, sku, qty_in, qty_out, unit, status, attempt, latency_ms, job_id?, memory_id?, space_id? }). Looks up the price fromcost_pricesand writes thecost_usdat insert time so historical rows stay honest across price changes. Failure to log is non-fatal — it logsconsole.errorand returns, never blocks the actual response. -
Wire the highest-cost layers first. Order matters: chat and summarize are 80%+ of variable cost.
- Layer 6 — Reason (
apps/web/app/api/chat/route.ts) — capture Anthropicusageincludingcache_read_input_tokensandcache_creation_input_tokens. The cache columns are non-negotiable here; without them you’ll over-estimate chat cost by 5–10×. - Layer 3 — Understand (
summarize-step/index.ts) — capture Anthropicusage, GeminiusageMetadata, Groqusage. Tag the event withattemptso the fallback-rate metric drops out of the same data. - Layer 2 — Transcribe (
transcribe-step/index.ts) — capture audio-seconds (the meter Groq Whisper is billed against), not tokens. - Layer 4 — Embed (
embed-step/index.ts) — capture Voyageusage.total_tokens. - Layer 5/10 — Compute (
pipeline-tick,diarize-step,reid-step,compute-edges-step,link-topics-step,backfill-summaries) — capture invocation count + duration asevent_type='compute_job'. Modal worker invocations follow the same pattern when audio-transcode runs. - Auxiliaries (Resend magic-link sends, future Stripe webhooks) — small, capture as
event_type='email'/event_type='payment'with the literal Resend / Stripe IDs insku.
- Layer 6 — Reason (
-
Nightly cron →
usage_storage_snapshots. New pgmq job or pg_cron entry, runs at 03:00 UTC. Readsstorage.objects(recordings bucket) summed by owner, counts memories/topics/edges/embeddings per user, multiplies by currentcost_pricesrates forsupabase_storageandsupabase_postgres_gb_day, writes one row per active user. Runs in ~seconds at V0 scale. -
Two SQL views — the only thing dashboards consume.
user_cost_variable_30d:sum(usage_events.cost_usd)+ 30 × avg dailyusage_storage_snapshots.storage_cost_usd, grouped byuser_id, last 30d. Answers “is this user profitable on the margin?”user_cost_loaded_30d:user_cost_variable_30d+ per-user share ofcost_overhead_monthlyper the chosenallocation_rule. Answers “is the business profitable?”
-
Internal
/admin/economicspage. Sujith-only (gated on existing admin-flag pattern). Three tables and one chart: top 50 users by loaded cost, cost by layer/vendor 30d, fallback rate by service, 30-day spend-by-layer line chart. No fancy framework — same Next.js + Supabase pattern the rest of the app uses.
What’s deferred (re-evaluate after V0.2 ships)
- User-facing
/account/usagepage. Shown in units (minutes, summaries, chats, history horizon, memory count), not dollars — see “On transparency” below for the rationale. Build after the internal dashboard has run for 30 days and the unit definitions stabilize. - Public
/pricingpage and plan-tier definitions. Pricing math is the output of running the loaded view for 30 days. Setting tiers before the data exists is fiction. The price-page work is its own follow-up; this ADR commits to the measurement infrastructure, not to specific dollar amounts. - Public
/how-we-priceexplainer. The strong version of transparency — explaining the cost shape in user-facing language, no exact numbers — composes well with ADR-0011’s layer model. Defer until pricing tiers are committed. - Vendor admin-API ingest. Anthropic, Gemini, Modal, Supabase, Resend, Sentry, PostHog, Stripe all expose admin/usage APIs. Pulling them into
cost_overhead_monthlyautomatically beats hand-entry. Cheap to add later; not blocking. - History-horizon enforcement on the storage side. Once the snapshot table shows the cost shape clearly, the cold-tier-or-prune logic for free-tier users past horizon can be designed against real data rather than guessed at. Plan-aware retention job lands here.
- Modal GPU telemetry. Pipecat / voice-talkback worker is paused per ADR-0010. When voice resumes, the worker emits
event_type='compute_job'withunit='gpu_sec'— same table, same helper. Wire at resumption, not now. - Plan-aware caps in addition to the existing seconds cap. Today only audio-seconds is capped. Future caps may be: chat tokens/day, embed calls/day, summarize calls/day. Only enable a cap once the loaded view shows a need.
- Per-tenant cost views. When B2B / Spaces become a real pricing axis, roll the same data up by
space_idinstead ofuser_id. Schema already carriesspace_id; the view is a one-liner.
Consequences
Easier
- Pricing decisions become data-driven rather than vibes-driven. The smallest answer (“does the average MAU cost more or less than $X”) is in the loaded view within ~30 days of wiring.
- Fallback-rate-by-service drops out of the same data —
count(*) where status='fallback' / count(*)per service. Per ADR-0011’s commodity-class layers, the fallback rate is the early warning that a primary vendor is degrading. - Cache-hit-rate on chat (
apps/web/app/api/chat/route.ts) becomes visible —sum(cache_read) / sum(cache_read + input)per user. Currently invisible; likely the highest-leverage cost reduction available. - Stakeholder updates can quote a real per-user cost number in the README’s stakeholder-facing rollup. Currently the README has no economics column at all.
Harder / new responsibilities
- The
cost_pricestable needs to be seeded with current rates per vendor and updated when rates change. Treat it as documentation: a price change requires a row, no exceptions. - The monthly overhead entry is a calendar item. Five minutes a month, but if it’s missed the loaded view diverges from reality.
- The
unitcolumn has to be applied honestly — STT measured inaudio_sec, LLMs intokens, storage ingb_sec, etc. Mixing units silently destroys the cost rollup. - The
/account/usagepage (when built) requires committed unit definitions that don’t change shape later. Picking units conservatively now (minutes, history-horizon-days, count of memories) is cheaper than re-stating them after launch.
Impossible / explicitly out of scope
- Logging request/response content into telemetry. Token counts only. The schema has no column for it; the helper has no parameter for it; the ADR-0007 consent posture forbids it.
- Showing users their own raw dollar cost on Free. See “On transparency” below — units, not dollars, on the user-facing page. Internal-only views show dollars.
- Replacing the existing seconds cap.
consume_recording_secondsis a cap (gate on ingest); the new tables are a cost view (read-only). Both stay.
What this enables that wasn’t possible before
- A real “is the business profitable at this scale” answer, ahead of the first paid tier.
- Per-user profitability ranking — the input to free-tier abuse policy, plan-cap design, and feature-gating decisions.
- A defensible answer to “why is your free tier so limited?” — because the loaded view shows the bound, not because of an arbitrary product call.
- A path to the Apple-clean public pricing posture: published units, published limits, no clutter, but with the underlying numbers honest enough to defend the page.
Notes
Why not lean harder on PostHog for cost telemetry
PostHog is already in the stack and would be the obvious “send each AI call as an event” path. Two reasons it’s not the right primary:
- PostHog event volume is itself a cost line in the auxiliaries bucket. Dual-firing every AI call to PostHog at high volume creates a feedback loop where the telemetry becomes a non-trivial fraction of the cost it’s measuring.
- PostHog is shaped for product analytics, not financial rollups. SQL aggregation against a Postgres table is the right tool for “sum cost by user by month”; PostHog insights are the right tool for “which features correlate with retention.” Both are needed; they’re different jobs.
PostHog still gets one event per AI call eventually, but as a secondary fan-out, sampled and shaped for retention/cohort analysis — not as the primary cost ledger. Wiring this is part of the deferred user-facing page work.
Chat is a separate axis and will misbehave
Per ADR-0011 Layer 6, the chat brain is Claude Agent SDK. At any non-trivial scale, chat is likely to be the single largest variable line per active user — orders of magnitude above transcribe/summarize/embed combined. Three implications:
- The Anthropic prompt-caching telemetry must be captured from day one.
cache_read_input_tokensandcache_creation_input_tokensare separate columns on the response, separately priced; treating them as plaininput_tokensproduces a number that’s wrong by 5–10×. - Chat will likely be the first feature with a hard per-tier cap. Not yet — caps before data is fiction — but the structure (
event_type='ai_call'+service='chat'filter) is ready. - Cache-hit-rate is the highest-leverage cost lever ARCIVE has. Surfacing it on the admin dashboard is part of the “what ships now” work, not deferred.
Why include attempt and status in the schema
The ADR-0011 fallback chains (Gemini → Anthropic → Groq for summarize, etc.) are a quality / cost / availability tradeoff. Without attempt and status columns, the cost rollup hides which provider is actually billing — a 4% Anthropic-fallback rate on summarize can dominate Layer-3 cost while Gemini “primary” looks free in the dashboard. Encoding the fallback story in the schema is what turns the cost view into an operational signal.
On transparency — units to users, dollars to ops
The strong version of “transparent” is not showing each user their own raw dollar cost. That cuts both ways: it’s honest, but it makes “I cost you $0.79 last month, why are you charging me $19” a recurring complaint. Most products avoid it. ARCIVE’s posture is closer to:
/account/usageshows units — minutes recorded, summaries generated, chats sent, history horizon used, memories stored. Language users grasp./pricingshows the same units honestly, per tier, with no jargon./how-we-price(the docs explainer) describes the cost shape in plain language — “we pay vendors per minute of audio and per chat message, plus storage rent” — without exact numbers. Apple-clean.
Internal-only /admin/economics shows dollars, layers, vendors, fallback rates. Two audiences, two surfaces, one underlying schema. Showing dollars on Pro+ as a power-user opt-in is a future call, not committed by this ADR.
How this maps to the ADR-0011 layer model
| Layer | Cost shape | usage_events.unit |
|---|---|---|
| 1 — Capture | client-side; storage write only | snapshot via usage_storage_snapshots.audio_bytes |
| 2 — Transcribe | per audio-second | audio_sec |
| 3 — Understand | per token | tokens |
| 4 — Embed | per token | tokens |
| 5 — Retrieve | per query (cheap; pgvector) | invocations |
| 6 — Reason | per token + cache columns | tokens (with cache_read/cache_write split) |
| 7 — Hear (paused) | per audio-second | audio_sec |
| 8 — Speak (paused) | per character | characters |
| 9 — Voice orchestration (paused) | per GPU-second | gpu_sec |
| 10 — Auto-correlate | per compute job | invocations |
| 11 — Surfaces | hosting, mostly fixed | snapshot via cost_overhead_monthly |
The layer column on every event row is what makes this table a useful pivot. It also means that when ADR-0011’s layer model is extended (a new layer is added, an existing one is re-classified), this telemetry doesn’t need a schema change — the new layer just shows up as new rows.
TL;DR for someone walking in cold
Tokens are ~30% of variable cost; storage rent and compute time are most of the rest. ARCIVE meters all five buckets (AI / compute / storage / auxiliaries / fixed) under one schema, with versioned prices so retroactive views stay honest across vendor price changes. Audio minutes are the primary user-facing meter; storage is expressed as a history horizon, not a GB cap; chat is a separate axis where prompt-caching telemetry is non-negotiable. Two views — variable-only and fully-loaded — feed an internal dashboard first and a user-facing units page later. Users see units; ops sees dollars. The ADR commits the schema; the dashboards and pricing follow once the data has run for 30 days.