ADR-0013: Per-user cost attribution — unified usage_events + daily storage snapshots, units to users / dollars to ops

Status: Accepted
Date: 2026-05-07
Deciders: Sujith
Affected: every layer that incurs per-user variable cost — supabase/functions/transcribe-step/, supabase/functions/summarize-step/, supabase/functions/embed-step/, supabase/functions/diarize-step/, supabase/functions/reid-step/, supabase/functions/compute-edges-step/, supabase/functions/link-topics-step/, supabase/functions/backfill-summaries/, supabase/functions/pipeline-tick/, apps/web/app/api/chat/, apps/web/lib/, supabase/migrations/, future apps/web/app/(admin)/economics/, future apps/web/app/(app)/account/usage/

Context

Three things forced this question on 2026-05-07:

Pricing decisions are blocked on a number we don’t have. ADR-0011 fixed the vendor architecture but said nothing about per-user economics. The README is stakeholder-facing (README.md) and the natural follow-up question — “what does an active ARCIVE user cost us?” — is unanswerable today. Every plan-tier conversation, free-tier limit, and runway projection depends on it.
Token-only telemetry is the wrong shape. The obvious first instinct (“log Anthropic/Gemini/Voyage usage per call”) covers maybe a third of marginal cost. The other two thirds are compute (Supabase Edge Function invocations, future Modal GPU-mins per ADR-0010 resumption), storage rent (raw audio in Supabase Storage, pgvector HNSW pages, postgres rows), and auxiliaries (Resend, Sentry, PostHog). A user who records heavily for six months then goes quiet still costs money indefinitely; a token-only meter shows them as free.
Today the only thing tracked is recording seconds. 20260503000005_usage_helpers.sql implements consume_recording_seconds writing to user_profiles.monthly_seconds_used. That’s a billing cap, not a cost view. The AI vendor calls in summarize-step/index.ts, embed-step/index.ts, transcribe-step/index.ts, diarize-step/index.ts, reid-step/index.ts, and apps/web/app/api/chat/route.ts all read the response usage / usageMetadata fields and discard them.

The pre-existing constraints that scoped the answer:

ADR-0011 — AI vendor strategy: the canonical layer model. Cost telemetry must compose with the layer model, not invent a parallel taxonomy.
ADR-0001 — pgmq + pg_cron: each pipeline step already has a job_id and a memory_id — variable-cost attribution is mostly free at the join layer.
ADR-0007 — Consent gate on agent retrieval: cost telemetry must never log content (prompts, transcripts, summaries) — only counts and identifiers.
Existing dev_pass + tier-cap mechanism (ADR-0005, 20260503000011_dev_pass.sql): the new telemetry must not double-meter or interfere with the existing seconds cap.

Options considered

Option A — Token-only `ai_usage_events` table

One table, one row per AI vendor call, with usage from each provider’s response captured. Dashboards roll up by user / vendor / layer.

Pros: ~150 lines of code; covers Anthropic, Gemini, Voyage, Groq cleanly; easy to wire in a week; matches what most YC-stage teams ship first.
Cons:
- Misses 60–70% of variable cost. Edge Function invocations, Modal GPU-mins (when voice resumes), Supabase Storage GB-months for raw audio, pgvector HNSW page footprint, Resend/Sentry/PostHog event-volume — none of these are tokens, all are real lines on the bill.
- Storage rent is invisible. Per ADR-0010 voice deferral, audio storage is the dominant fixed-per-user cost ARCIVE will accumulate before voice ever ships. A token-only meter shows a heavy historical user as cheap.
- Forces a second migration in three months when the gap becomes obvious. Worse: the second migration will overlap and partially supersede the first, leaving a confusing historical schema.

Option B — Rely on vendor billing portals; build no internal telemetry

Use Anthropic / Gemini / Supabase / Modal / Resend admin dashboards directly. Pull monthly totals into a spreadsheet. Set tier caps based on aggregate cost / MAU.

Pros: zero implementation cost; vendor dashboards already exist and are accurate.
Cons:
- No per-user attribution at all. Every vendor reports per-API-key, not per-end-user. You can know “we spent $X on Anthropic last month” but not “Alice cost $0.79, Bob cost $14.20.”
- Therefore can’t answer the actual business questions — which users are unprofitable on the free tier? what’s the right per-tier limit? is the chat-heavy power-user costing more than they pay?
- Therefore can’t surface a user-facing /account/usage page either, because the data doesn’t exist at user grain.
- Therefore can’t price transparently — only aggregate-fairly.

Option C — Unified `usage_events` + daily `usage_storage_snapshots` + versioned `cost_prices` + monthly `cost_overhead_monthly` (chosen)

One event stream for variable cost (AI calls, compute jobs, emails, analytics events, payment events) and one daily snapshot for storage rent (the stock problem that doesn’t fit an event stream). A versioned price book turns counts into dollars at write time. A small overhead table allocates fixed costs into a fully-loaded view. Two SQL views — variable-only and fully-loaded — feed an internal /admin/economics page first and a user-facing /account/usage page later.

Pros:
- Covers all five cost buckets (AI, compute, storage, auxiliaries, allocated fixed) under one schema. No second migration in three months.
- Storage rent is first-class. Daily snapshot answers “the heavy-historical user with no recent activity” question that token-only telemetry can’t.
- Versioned prices. When Anthropic drops Haiku 5× (ADR-0011’s explicit example), retroactive cost views stay honest — old rows stay priced at the old rate, new rows at the new rate. No destructive price update.
- Layer-aware. Each event row carries the ADR-0011 layer number, so the dashboards roll up along the same taxonomy already in use everywhere else.
- Two attribution lenses, side-by-side. The variable-only view answers “are we losing money on the margin?”; the fully-loaded view answers “is the business profitable at this scale?” The gap between them is the fixed-cost dilution curve that tells you when each plan tier breaks even.
- Privacy-clean by construction. Schema is counts and IDs; no prompt, transcript, summary, or body column. Aligns with ADR-0007’s “minimum information sufficient for the function” framing.
- Composes with existing seconds cap. consume_recording_seconds stays — it’s a cap, not a cost view; both are needed and they don’t overlap.
Cons:
- One non-trivial migration; one shared logUsage() helper to add and call from ~10 call sites; one nightly cron for storage snapshots.
- cost_overhead_monthly has to be entered by hand monthly until vendor admin APIs are wired (deferred). Acceptable: ~5 minutes/month for 6–8 line items.
- Dashboard work isn’t free (deferred to a follow-up — this ADR commits to the data shape, not the UI).

Decision

Adopt Option C — unified usage_events + daily usage_storage_snapshots + versioned cost_prices + monthly cost_overhead_monthly, with two views (variable / fully-loaded) feeding an internal /admin/economics page first and a user-facing /account/usage page after V0.2 ships.

Cost telemetry is layered on top of the existing pipeline, never inside it. No call site has its primary logic changed; each gains a single logUsage() line near the response handler. The schema is the immutable commitment of this ADR; the UIs and the user-facing pricing copy are the visible follow-up work.

Two framings that anchor the rest

Audio minutes are the primary meter. They drive transcribe + summarize + embed + (eventually) voice. Every adjacent product (Otter, Granola, Limitless, Plaud, Bee) meters minutes for the same reason — minutes correlate with cost. ARCIVE already tracks minutes via consume_recording_seconds. Chat tokens are the secondary meter and behave differently — see “Chat is a separate axis” below.
Storage is rent, expressed as a history horizon — not a GB cap. Free tier keeps N days of audio history; Pro keeps everything. “History horizon” is a story users grasp; “5 GB cap” is a story they don’t. It also gives ARCIVE a clean reclamation path for dormant free-tier users without silently deleting their data — old recordings tier to cold-storage or are pruned per the horizon, with the user’s consent baked into the plan.

What ships now (V0.2 — bundled in one feature branch)

These are small, reinforcing changes. None are speculative.

Schema migration — four tables, all new. No changes to existing tables.
- usage_events — per-call variable cost. One row per AI call, compute job, email send, analytics event, or payment event. Columns: id, user_id (nullable for system jobs), space_id, memory_id, job_id, layer int (per ADR-0011), event_type text ('ai_call' | 'compute_job' | 'email' | 'analytics' | 'payment'), vendor text, service text, sku text (model id, function name, plan), qty_in numeric, qty_out numeric, unit text ('tokens' | 'audio_sec' | 'gb_sec' | 'invocations' | 'emails' | 'events' | 'usd_charged'), cost_usd numeric(12,6), latency_ms int, status text ('ok' | 'fallback' | 'error'), attempt int (which provider in the fallback chain succeeded), created_at timestamptz. RLS: service_role writes; users select their own rows for the /account/usage surface. Indexes on (user_id, created_at), (layer, created_at), (vendor, created_at). Partition by month is deferred until row count justifies it.
- usage_storage_snapshots — daily stock per user. Columns: day date, user_id uuid, audio_bytes bigint, embedding_rows int, embedding_bytes_est bigint, memory_rows int, topic_rows int, edge_rows int, egress_bytes_today bigint, storage_cost_usd numeric. PK (day, user_id). ~365 rows/user/year — trivially small.
- cost_prices — versioned price book. Columns: vendor text, sku text, unit text, usd_per_unit numeric(12,8), effective_at timestamptz, expires_at timestamptz. Lookup is “find the row where now() between effective_at and coalesce(expires_at, 'infinity').” Vendor price changes are a new row, never a destructive update.
- cost_overhead_monthly — fixed-cost allocation. Columns: month date, vendor text, amount_usd numeric, allocation_rule text ('equal_per_mau' | 'weighted_by_variable' | 'unallocated'), note text. Manually entered each month against the actual vendor bill until vendor admin APIs are wired (deferred).
Shared logUsage() helper in supabase/functions/_shared/log-usage.ts (Edge Functions side) and apps/web/lib/log-usage.ts (Next.js side). Single function: logUsage({ user_id, layer, event_type, vendor, service, sku, qty_in, qty_out, unit, status, attempt, latency_ms, job_id?, memory_id?, space_id? }). Looks up the price from cost_prices and writes the cost_usd at insert time so historical rows stay honest across price changes. Failure to log is non-fatal — it logs console.error and returns, never blocks the actual response.
Wire the highest-cost layers first. Order matters: chat and summarize are 80%+ of variable cost.
- Layer 6 — Reason (apps/web/app/api/chat/route.ts) — capture Anthropic usage including cache_read_input_tokens and cache_creation_input_tokens. The cache columns are non-negotiable here; without them you’ll over-estimate chat cost by 5–10×.
- Layer 3 — Understand (summarize-step/index.ts) — capture Anthropic usage, Gemini usageMetadata, Groq usage. Tag the event with attempt so the fallback-rate metric drops out of the same data.
- Layer 2 — Transcribe (transcribe-step/index.ts) — capture audio-seconds (the meter Groq Whisper is billed against), not tokens.
- Layer 4 — Embed (embed-step/index.ts) — capture Voyage usage.total_tokens.
- Layer 5/10 — Compute (pipeline-tick, diarize-step, reid-step, compute-edges-step, link-topics-step, backfill-summaries) — capture invocation count + duration as event_type='compute_job'. Modal worker invocations follow the same pattern when audio-transcode runs.
- Auxiliaries (Resend magic-link sends, future Stripe webhooks) — small, capture as event_type='email' / event_type='payment' with the literal Resend / Stripe IDs in sku.
Nightly cron → usage_storage_snapshots. New pgmq job or pg_cron entry, runs at 03:00 UTC. Reads storage.objects (recordings bucket) summed by owner, counts memories/topics/edges/embeddings per user, multiplies by current cost_prices rates for supabase_storage and supabase_postgres_gb_day, writes one row per active user. Runs in ~seconds at V0 scale.
Two SQL views — the only thing dashboards consume.
- user_cost_variable_30d: sum(usage_events.cost_usd) + 30 × avg daily usage_storage_snapshots.storage_cost_usd, grouped by user_id, last 30d. Answers “is this user profitable on the margin?”
- user_cost_loaded_30d: user_cost_variable_30d + per-user share of cost_overhead_monthly per the chosen allocation_rule. Answers “is the business profitable?”
Internal /admin/economics page. Sujith-only (gated on existing admin-flag pattern). Three tables and one chart: top 50 users by loaded cost, cost by layer/vendor 30d, fallback rate by service, 30-day spend-by-layer line chart. No fancy framework — same Next.js + Supabase pattern the rest of the app uses.

What’s deferred (re-evaluate after V0.2 ships)

User-facing /account/usage page. Shown in units (minutes, summaries, chats, history horizon, memory count), not dollars — see “On transparency” below for the rationale. Build after the internal dashboard has run for 30 days and the unit definitions stabilize.
Public /pricing page and plan-tier definitions. Pricing math is the output of running the loaded view for 30 days. Setting tiers before the data exists is fiction. The price-page work is its own follow-up; this ADR commits to the measurement infrastructure, not to specific dollar amounts.
Public /how-we-price explainer. The strong version of transparency — explaining the cost shape in user-facing language, no exact numbers — composes well with ADR-0011’s layer model. Defer until pricing tiers are committed.
Vendor admin-API ingest. Anthropic, Gemini, Modal, Supabase, Resend, Sentry, PostHog, Stripe all expose admin/usage APIs. Pulling them into cost_overhead_monthly automatically beats hand-entry. Cheap to add later; not blocking.
History-horizon enforcement on the storage side. Once the snapshot table shows the cost shape clearly, the cold-tier-or-prune logic for free-tier users past horizon can be designed against real data rather than guessed at. Plan-aware retention job lands here.
Modal GPU telemetry. Pipecat / voice-talkback worker is paused per ADR-0010. When voice resumes, the worker emits event_type='compute_job' with unit='gpu_sec' — same table, same helper. Wire at resumption, not now.
Plan-aware caps in addition to the existing seconds cap. Today only audio-seconds is capped. Future caps may be: chat tokens/day, embed calls/day, summarize calls/day. Only enable a cap once the loaded view shows a need.
Per-tenant cost views. When B2B / Spaces become a real pricing axis, roll the same data up by space_id instead of user_id. Schema already carries space_id; the view is a one-liner.

Consequences

Easier

Pricing decisions become data-driven rather than vibes-driven. The smallest answer (“does the average MAU cost more or less than $X”) is in the loaded view within ~30 days of wiring.
Fallback-rate-by-service drops out of the same data — count(*) where status='fallback' / count(*) per service. Per ADR-0011’s commodity-class layers, the fallback rate is the early warning that a primary vendor is degrading.
Cache-hit-rate on chat (apps/web/app/api/chat/route.ts) becomes visible — sum(cache_read) / sum(cache_read + input) per user. Currently invisible; likely the highest-leverage cost reduction available.
Stakeholder updates can quote a real per-user cost number in the README’s stakeholder-facing rollup. Currently the README has no economics column at all.

Harder / new responsibilities

The cost_prices table needs to be seeded with current rates per vendor and updated when rates change. Treat it as documentation: a price change requires a row, no exceptions.
The monthly overhead entry is a calendar item. Five minutes a month, but if it’s missed the loaded view diverges from reality.
The unit column has to be applied honestly — STT measured in audio_sec, LLMs in tokens, storage in gb_sec, etc. Mixing units silently destroys the cost rollup.
The /account/usage page (when built) requires committed unit definitions that don’t change shape later. Picking units conservatively now (minutes, history-horizon-days, count of memories) is cheaper than re-stating them after launch.

Impossible / explicitly out of scope

Logging request/response content into telemetry. Token counts only. The schema has no column for it; the helper has no parameter for it; the ADR-0007 consent posture forbids it.
Showing users their own raw dollar cost on Free. See “On transparency” below — units, not dollars, on the user-facing page. Internal-only views show dollars.
Replacing the existing seconds cap. consume_recording_seconds is a cap (gate on ingest); the new tables are a cost view (read-only). Both stay.

What this enables that wasn’t possible before

A real “is the business profitable at this scale” answer, ahead of the first paid tier.
Per-user profitability ranking — the input to free-tier abuse policy, plan-cap design, and feature-gating decisions.
A defensible answer to “why is your free tier so limited?” — because the loaded view shows the bound, not because of an arbitrary product call.
A path to the Apple-clean public pricing posture: published units, published limits, no clutter, but with the underlying numbers honest enough to defend the page.

Notes

Why not lean harder on PostHog for cost telemetry

PostHog is already in the stack and would be the obvious “send each AI call as an event” path. Two reasons it’s not the right primary:

PostHog event volume is itself a cost line in the auxiliaries bucket. Dual-firing every AI call to PostHog at high volume creates a feedback loop where the telemetry becomes a non-trivial fraction of the cost it’s measuring.
PostHog is shaped for product analytics, not financial rollups. SQL aggregation against a Postgres table is the right tool for “sum cost by user by month”; PostHog insights are the right tool for “which features correlate with retention.” Both are needed; they’re different jobs.

PostHog still gets one event per AI call eventually, but as a secondary fan-out, sampled and shaped for retention/cohort analysis — not as the primary cost ledger. Wiring this is part of the deferred user-facing page work.

Chat is a separate axis and will misbehave

Per ADR-0011 Layer 6, the chat brain is Claude Agent SDK. At any non-trivial scale, chat is likely to be the single largest variable line per active user — orders of magnitude above transcribe/summarize/embed combined. Three implications:

The Anthropic prompt-caching telemetry must be captured from day one. cache_read_input_tokens and cache_creation_input_tokens are separate columns on the response, separately priced; treating them as plain input_tokens produces a number that’s wrong by 5–10×.
Chat will likely be the first feature with a hard per-tier cap. Not yet — caps before data is fiction — but the structure (event_type='ai_call' + service='chat' filter) is ready.
Cache-hit-rate is the highest-leverage cost lever ARCIVE has. Surfacing it on the admin dashboard is part of the “what ships now” work, not deferred.

Why include `attempt` and `status` in the schema

The ADR-0011 fallback chains (Gemini → Anthropic → Groq for summarize, etc.) are a quality / cost / availability tradeoff. Without attempt and status columns, the cost rollup hides which provider is actually billing — a 4% Anthropic-fallback rate on summarize can dominate Layer-3 cost while Gemini “primary” looks free in the dashboard. Encoding the fallback story in the schema is what turns the cost view into an operational signal.

On transparency — units to users, dollars to ops

The strong version of “transparent” is not showing each user their own raw dollar cost. That cuts both ways: it’s honest, but it makes “I cost you $0.79 last month, why are you charging me $19” a recurring complaint. Most products avoid it. ARCIVE’s posture is closer to:

/account/usage shows units — minutes recorded, summaries generated, chats sent, history horizon used, memories stored. Language users grasp.
/pricing shows the same units honestly, per tier, with no jargon.
/how-we-price (the docs explainer) describes the cost shape in plain language — “we pay vendors per minute of audio and per chat message, plus storage rent” — without exact numbers. Apple-clean.

Internal-only /admin/economics shows dollars, layers, vendors, fallback rates. Two audiences, two surfaces, one underlying schema. Showing dollars on Pro+ as a power-user opt-in is a future call, not committed by this ADR.

How this maps to the ADR-0011 layer model

Layer	Cost shape	`usage_events.unit`
1 — Capture	client-side; storage write only	snapshot via `usage_storage_snapshots.audio_bytes`
2 — Transcribe	per audio-second	`audio_sec`
3 — Understand	per token	`tokens`
4 — Embed	per token	`tokens`
5 — Retrieve	per query (cheap; pgvector)	`invocations`
6 — Reason	per token + cache columns	`tokens` (with cache_read/cache_write split)
7 — Hear (paused)	per audio-second	`audio_sec`
8 — Speak (paused)	per character	`characters`
9 — Voice orchestration (paused)	per GPU-second	`gpu_sec`
10 — Auto-correlate	per compute job	`invocations`
11 — Surfaces	hosting, mostly fixed	snapshot via `cost_overhead_monthly`

The layer column on every event row is what makes this table a useful pivot. It also means that when ADR-0011’s layer model is extended (a new layer is added, an existing one is re-classified), this telemetry doesn’t need a schema change — the new layer just shows up as new rows.

TL;DR for someone walking in cold

Tokens are ~30% of variable cost; storage rent and compute time are most of the rest. ARCIVE meters all five buckets (AI / compute / storage / auxiliaries / fixed) under one schema, with versioned prices so retroactive views stay honest across vendor price changes. Audio minutes are the primary user-facing meter; storage is expressed as a history horizon, not a GB cap; chat is a separate axis where prompt-caching telemetry is non-negotiable. Two views — variable-only and fully-loaded — feed an internal dashboard first and a user-facing units page later. Users see units; ops sees dollars. The ADR commits the schema; the dashboards and pricing follow once the data has run for 30 days.

ADR-0013: Per-user cost attribution — unified usage_events + daily storage snapshots, units to users / dollars to ops

Context

Options considered

Option A — Token-only `ai_usage_events` table

Option B — Rely on vendor billing portals; build no internal telemetry

Option C — Unified `usage_events` + daily `usage_storage_snapshots` + versioned `cost_prices` + monthly `cost_overhead_monthly` (chosen)

Decision

Two framings that anchor the rest

What ships now (V0.2 — bundled in one feature branch)

What’s deferred (re-evaluate after V0.2 ships)

Consequences

Notes

Why not lean harder on PostHog for cost telemetry

Chat is a separate axis and will misbehave

Why include `attempt` and `status` in the schema

On transparency — units to users, dollars to ops

How this maps to the ADR-0011 layer model

TL;DR for someone walking in cold

Plans

Operations

Decisions (ADRs)

Discussions

ADR-0013: Per-user cost attribution — unified usage_events + daily storage snapshots, units to users / dollars to ops

Context

Options considered

Option A — Token-only ai_usage_events table

Option B — Rely on vendor billing portals; build no internal telemetry

Option C — Unified usage_events + daily usage_storage_snapshots + versioned cost_prices + monthly cost_overhead_monthly (chosen)

Decision

Two framings that anchor the rest

What ships now (V0.2 — bundled in one feature branch)

What’s deferred (re-evaluate after V0.2 ships)

Consequences

Notes

Why not lean harder on PostHog for cost telemetry

Chat is a separate axis and will misbehave

Why include attempt and status in the schema

On transparency — units to users, dollars to ops

How this maps to the ADR-0011 layer model

TL;DR for someone walking in cold

Plans

Operations

Decisions (ADRs)

Discussions

Option A — Token-only `ai_usage_events` table

Option C — Unified `usage_events` + daily `usage_storage_snapshots` + versioned `cost_prices` + monthly `cost_overhead_monthly` (chosen)

Why include `attempt` and `status` in the schema