Graph view + tagging strategy — Working Session Notes

Date: 2026-05-06 Format: Working session notes — exploratory, not a committed plan. Outcome: ADR-0012 — Topics as first-class graph nodes (Proposed) Companion docs: 2026-05-04_multimodal_expansion.md · 2026-05-05_ai_strategy_architecture.md

Question 1 — Standard practices in Obsidian and journaling apps for tags. Inline `#tag` in prose, frontmatter, end-of-note? What’s best for a graph view like Obsidian when AI is doing it usually and the user occasionally?

Clarification first: in Obsidian, tags and graph view are mostly separate systems. The graph is driven by [[wikilinks]] between notes, not by #tags (tags appear as nodes but are secondary).

Standard practices, ranked by how systematic users do it:

YAML frontmatter for structured tags — top of note, tags: field. Best for AI-generated tags: clean to parse, easy to edit programmatically, doesn’t clutter prose.
Inline #tags within text — used when the tag is contextual to a paragraph. Good for human flow-of-thought tagging, messy for AI to manage.
Tags appended at end — older journaling convention; superseded by frontmatter in Obsidian circles.
[[Wikilinks]] inline — for actual concepts, people, projects. These build the graph. Zettelkasten leans heavily on this.

For an AI-mostly setup: frontmatter for tags, inline wikilinks for entities. AI populates both on ingest; user adjusts occasionally.

On graph behavior: wikilinks create note-to-note edges (the main structure). Tags appear as their own node type but don’t link to each other — they cluster, they don’t web. The interesting graph everyone screenshots comes almost entirely from wikilinks.

Question 2 — Compare Obsidian / Miro / mind maps and check the strategies

Four families, sharply different philosophies:

Link-driven graphs — Obsidian, Logseq, Roam Research

Connections emerge from [[wikilinks]] and #tags written into the text. Graph is a byproduct of writing.

AI fit: high — easy to inject links and frontmatter into markdown.

Typed-node / schema systems — Tana, Capacities, Notion

Tana’s supertags are the headline idea: a #meeting tag isn’t just a label, it’s a schema that adds fields (attendees, project, date) and turns the node into a queryable database row. Capacities works similarly.

AI fit: very high — structured fields are exactly what LLMs produce well, and queries beat graph-staring for retrieval.

Spatial canvases — Miro, Heptabase, Scrintal, Excalidraw, Obsidian Canvas

No automatic graph — place cards, draw arrows. The “tagging” is spatial: clusters, colors, frames.

AI fit: low for auto-organization, decent for generating starter cards.

Mind maps — XMind, MindMeister, Coggle, FreeMind

Strict parent-child hierarchy, one root. Tree, not graph. Cross-links are second-class.

AI fit: medium — easy to generate a tree, but the model is too rigid for evolving knowledge.

Strategy implications for ARCIVE (AI-mostly, ambient, multimodal)

Want emergent graph aesthetic → Obsidian-style (AI generates wikilinks for entities + frontmatter for categories). Cheapest, prettiest graph, weakest retrieval.
Want actually queryable knowledge → Tana / Capacities. AI fills typed fields; you ask “all journal entries tagged anxious where I mentioned Sarah” and get a real answer. Graph is less central but more useful.
Want visual thinking → Heptabase. AI drafts cards, user arranges.
Mind maps and Miro are poor fits for ongoing journaling — they’re for one-shot ideation.

Honest take: Obsidian’s graph view is mostly eye candy past a few hundred notes. Tana-style typed tags scale better for AI-generated content because the AI’s output becomes structured data rather than prose-with-sprinkled-tags.

Question 3 — How would the Obsidian-style emergent graph be best implemented in Next.js 15 / Vercel + Expo, given multimodal input?

Key architectural insight: don’t store markdown with [[wikilinks]] like Obsidian does — ARCIVE isn’t file-based. Store the captured artifact for display, but extract entities and tags into relational tables so the graph and search are real database queries.

Data model:

notes(id, user_id, body_md, frontmatter jsonb, created_at, embedding vector)
entities(id, user_id, canonical_name, type, aliases text[], embedding vector)
tags(id, user_id, name, kind)
note_entities(note_id, entity_id, mention_text, char_offset)
note_tags(note_id, tag_id)

The entities table is what gives you the Obsidian graph: nodes = entities, edges = co-occurrence. Embeddings on entities solve the “is this Sarah the same Sarah?” problem — without entity resolution, you’ll have “Sarah”, “Sarah K”, “sarah” as three separate graph nodes within a week.

Multimodal extension — every modality eventually becomes (a) some text/caption and (b) one embedding vector. Per-attachment derivation:

Modality	Derive step	Tool
Voice/video audio	Transcribe with timestamps	Deepgram, AssemblyAI, Whisper via Groq
Image	Caption + entity detection	Vision model (one call gets caption + entities)
Handwriting	OCR	Vision model; Google Vision as fallback
Screenshot	OCR + UI context	Vision model
EXIF	Pull location, time	exifr; reverse-geocode location to a place entity

Multimodal embedding model > caption-then-embed. Voyage Multimodal-3 or Cohere Embed v4 — image+text in one shared space, one model, one HNSW index. Half-day swap, big recall win.

Confidence weighting on edges:

Text mention of “Sarah” → high
Photo with face matching prior Sarahs → medium
Voice memo whose transcript mentions Sarah → high (transcript is text)
GPS in EXIF matching a place entity → automatic place edge

Photos and places give you nodes that have thumbnails — way better aesthetic than Obsidian’s dot-and-label.

Cost reality:

5-min voice memo: ~$0.03 (transcription + extraction)
Single photo: ~$0.01 vision
Heavy user with 10 voice + 20 photos/day: ~$15/month API costs

Implications: debounce extraction (don’t re-extract on every edit), cache derivations, tier the model (Haiku 4.5 routine, escalate to Sonnet for ambiguous cases), batch photos in single vision calls.

Question 4 — Are we already extracting these tags? Could the same extraction produce inline-highlighted links in the viewer plus tags as separate metadata?

Yes — already doing it. supabase/functions/summarize-step/index.ts:24-51 extracts tags from transcript text. The prompt explicitly says topics are “GRAPH EDGES, not search facets” and biases toward recurring nodes (people, projects, themes), 2–4 typical, 5 max. Stored on memories.topics.

The chat summary’s “AI extracts tags from text” advice isn’t new — it’s the existing pipeline. The question is what we do downstream of that.

Three gaps in ARCIVE today (not redesign, but small additions):

1. Topics aren’t edges yet — only embeddings are

compute-edges-step/index.ts:46-64 builds memory_edges purely from embedding similarity (match_memories RPC, top-8, ≥0.55 cosine). Two memories that both tag Daniel are only connected if their vectors happen to be close. The cheaper, sharper edge is the topic itself: shared topic = explicit edge, no LLM/vector cost.

2. No topic canonicalization

Daniel / daniel / Dan will become three nodes within a week. Today’s topics text[] column has no canonical identity. The minimum fix is a topics table with normalize-on-insert (lowercase + trigram match against existing labels for that user).

3. No inline highlighting → no in-text links

The right split: proper-case topics (Daniel, Q2 Roadmap) become inline highlights in the transcript viewer; lowercase topics (anxiety, public speaking) become chips above the transcript. The existing prompt almost gives this for free — case is already used to distinguish entities from themes.

Char offsets from the LLM are fragile. Just string-match topics against transcript at render time, case-insensitive word boundary, longest-match-first. That’s what most viewers do; cheap and robust.

Synthesis — the framing that became ADR-0012

The chat’s framing (“Obsidian aesthetic, Tana retrieval”) is useful as vocabulary but ARCIVE is structurally closer to Tana with a force-graph skin: typed entity nodes, AI-populated, queryable, with a graph view as one of several surfaces.

This is on-brand:

Calm, ambient, AI-does-the-work (2026-05-04_multimodal_expansion.md §“Philosophical reframe”) — user never types [[wikilinks]].
Auto-correlation, not user-initiated tagging is ARCIVE’s MOAT layer (ADR-0011 Layer 10).
Multimodal landings — vision/OCR-extracted topics from photos flow into the same memory_topics table, no parallel system.

What lands in ADR-0012

Ships now (V0.3 slice):

topics + memory_topics schema with kind (person | place | project | theme | event) and an embedding column reusing the Voyage-3-lite 512-d space.
Extend summarize-step prompt to return [{label, kind}].
New link-topics-step pgmq function with hybrid pg_trgm + pgvector resolution.
Extend compute-edges-step to write topic-shared edges with kind='topic'; existing edges become kind='semantic'.
Render-time inline highlighting in the transcript viewer + chip row above.
Universe view nodes are topics; memory↔memory edges become a secondary toggle.

Deferred (future feature work):

Confidence weighting on memory↔topic links.
EXIF → place topics, free.
Diarized speaker → person topic attribution.
Vision/OCR-extracted topics for images (lands with multimodal ingest).
MCP retrieval over topics (memories_by_topic, related_topics).
Topic merge/split UX.
Cluster-around-entity zoom view.

Open decisions

Adopt ADR-0012 — accepted, shipped 2026-05-07 (PR #11).
When mobile Universe view ships, build against topic-nodes from day one (avoid rebuild). Shipped 2026-05-06 (feature/mobile-universe) — Skia + d3-force, edges/nodes already topic-aware via the β.1 schema, no retrofit needed.
Confirm Voyage Multimodal-3 vs Cohere Embed v4 for the multimodal swap (separate decision tracked in 2026-05-04_multimodal_expansion.md).

Lessons recorded

Read the prompts before proposing infrastructure. The conversation was heading toward “we should add topic extraction” — a half-day discovery in the existing code revealed we’d been doing it for weeks and just stranding the output. The actual ADR is “use what we already extract,” not “extract more.”
Obsidian-style wikilinks-in-prose is a file-format artifact, not a strategy. ARCIVE has no editable prose layer; that pattern is irrelevant. The vocabulary stays useful (entities-as-nodes, graph-as-surface) — the markup mechanics don’t.
The graph ≠ the value. Obsidian power users discover the force-graph stops being useful past ~500 nodes. The actual product win is retrieval (typed entity queries, MCP surfaces) with the graph as one rendering of the same data.

Graph view + tagging strategy — Working Session Notes

Question 1 — Standard practices in Obsidian and journaling apps for tags. Inline `#tag` in prose, frontmatter, end-of-note? What’s best for a graph view like Obsidian when AI is doing it usually and the user occasionally?

Question 2 — Compare Obsidian / Miro / mind maps and check the strategies

Link-driven graphs — Obsidian, Logseq, Roam Research

Typed-node / schema systems — Tana, Capacities, Notion

Spatial canvases — Miro, Heptabase, Scrintal, Excalidraw, Obsidian Canvas

Mind maps — XMind, MindMeister, Coggle, FreeMind

Strategy implications for ARCIVE (AI-mostly, ambient, multimodal)

Question 3 — How would the Obsidian-style emergent graph be best implemented in Next.js 15 / Vercel + Expo, given multimodal input?

Question 4 — Are we already extracting these tags? Could the same extraction produce inline-highlighted links in the viewer plus tags as separate metadata?

1. Topics aren’t edges yet — only embeddings are

2. No topic canonicalization

3. No inline highlighting → no in-text links

Synthesis — the framing that became ADR-0012

What lands in ADR-0012

Open decisions

Lessons recorded

Plans

Operations

Decisions (ADRs)

Discussions

Graph view + tagging strategy — Working Session Notes

Question 1 — Standard practices in Obsidian and journaling apps for tags. Inline #tag in prose, frontmatter, end-of-note? What’s best for a graph view like Obsidian when AI is doing it usually and the user occasionally?

Question 2 — Compare Obsidian / Miro / mind maps and check the strategies

Link-driven graphs — Obsidian, Logseq, Roam Research

Typed-node / schema systems — Tana, Capacities, Notion

Spatial canvases — Miro, Heptabase, Scrintal, Excalidraw, Obsidian Canvas

Mind maps — XMind, MindMeister, Coggle, FreeMind

Strategy implications for ARCIVE (AI-mostly, ambient, multimodal)

Question 3 — How would the Obsidian-style emergent graph be best implemented in Next.js 15 / Vercel + Expo, given multimodal input?

Question 4 — Are we already extracting these tags? Could the same extraction produce inline-highlighted links in the viewer plus tags as separate metadata?

1. Topics aren’t edges yet — only embeddings are

2. No topic canonicalization

3. No inline highlighting → no in-text links

Synthesis — the framing that became ADR-0012

What lands in ADR-0012

Open decisions

Lessons recorded

Plans

Operations

Decisions (ADRs)

Discussions

Question 1 — Standard practices in Obsidian and journaling apps for tags. Inline `#tag` in prose, frontmatter, end-of-note? What’s best for a graph view like Obsidian when AI is doing it usually and the user occasionally?