Skip to content

Graph view + tagging strategy — Working Session Notes

Date: 2026-05-06 Format: Working session notes — exploratory, not a committed plan. Outcome: ADR-0012 — Topics as first-class graph nodes (Proposed) Companion docs: 2026-05-04_multimodal_expansion.md · 2026-05-05_ai_strategy_architecture.md


Question 1 — Standard practices in Obsidian and journaling apps for tags. Inline #tag in prose, frontmatter, end-of-note? What’s best for a graph view like Obsidian when AI is doing it usually and the user occasionally?

Clarification first: in Obsidian, tags and graph view are mostly separate systems. The graph is driven by [[wikilinks]] between notes, not by #tags (tags appear as nodes but are secondary).

Standard practices, ranked by how systematic users do it:

  1. YAML frontmatter for structured tags — top of note, tags: field. Best for AI-generated tags: clean to parse, easy to edit programmatically, doesn’t clutter prose.
  2. Inline #tags within text — used when the tag is contextual to a paragraph. Good for human flow-of-thought tagging, messy for AI to manage.
  3. Tags appended at end — older journaling convention; superseded by frontmatter in Obsidian circles.
  4. [[Wikilinks]] inline — for actual concepts, people, projects. These build the graph. Zettelkasten leans heavily on this.

For an AI-mostly setup: frontmatter for tags, inline wikilinks for entities. AI populates both on ingest; user adjusts occasionally.

On graph behavior: wikilinks create note-to-note edges (the main structure). Tags appear as their own node type but don’t link to each other — they cluster, they don’t web. The interesting graph everyone screenshots comes almost entirely from wikilinks.

Question 2 — Compare Obsidian / Miro / mind maps and check the strategies

Four families, sharply different philosophies:

Connections emerge from [[wikilinks]] and #tags written into the text. Graph is a byproduct of writing.

  • AI fit: high — easy to inject links and frontmatter into markdown.

Typed-node / schema systems — Tana, Capacities, Notion

Tana’s supertags are the headline idea: a #meeting tag isn’t just a label, it’s a schema that adds fields (attendees, project, date) and turns the node into a queryable database row. Capacities works similarly.

  • AI fit: very high — structured fields are exactly what LLMs produce well, and queries beat graph-staring for retrieval.

Spatial canvases — Miro, Heptabase, Scrintal, Excalidraw, Obsidian Canvas

No automatic graph — place cards, draw arrows. The “tagging” is spatial: clusters, colors, frames.

  • AI fit: low for auto-organization, decent for generating starter cards.

Mind maps — XMind, MindMeister, Coggle, FreeMind

Strict parent-child hierarchy, one root. Tree, not graph. Cross-links are second-class.

  • AI fit: medium — easy to generate a tree, but the model is too rigid for evolving knowledge.

Strategy implications for ARCIVE (AI-mostly, ambient, multimodal)

  • Want emergent graph aesthetic → Obsidian-style (AI generates wikilinks for entities + frontmatter for categories). Cheapest, prettiest graph, weakest retrieval.
  • Want actually queryable knowledge → Tana / Capacities. AI fills typed fields; you ask “all journal entries tagged anxious where I mentioned Sarah” and get a real answer. Graph is less central but more useful.
  • Want visual thinking → Heptabase. AI drafts cards, user arranges.
  • Mind maps and Miro are poor fits for ongoing journaling — they’re for one-shot ideation.

Honest take: Obsidian’s graph view is mostly eye candy past a few hundred notes. Tana-style typed tags scale better for AI-generated content because the AI’s output becomes structured data rather than prose-with-sprinkled-tags.

Question 3 — How would the Obsidian-style emergent graph be best implemented in Next.js 15 / Vercel + Expo, given multimodal input?

Key architectural insight: don’t store markdown with [[wikilinks]] like Obsidian does — ARCIVE isn’t file-based. Store the captured artifact for display, but extract entities and tags into relational tables so the graph and search are real database queries.

Data model:

notes(id, user_id, body_md, frontmatter jsonb, created_at, embedding vector)
entities(id, user_id, canonical_name, type, aliases text[], embedding vector)
tags(id, user_id, name, kind)
note_entities(note_id, entity_id, mention_text, char_offset)
note_tags(note_id, tag_id)

The entities table is what gives you the Obsidian graph: nodes = entities, edges = co-occurrence. Embeddings on entities solve the “is this Sarah the same Sarah?” problem — without entity resolution, you’ll have “Sarah”, “Sarah K”, “sarah” as three separate graph nodes within a week.

Multimodal extension — every modality eventually becomes (a) some text/caption and (b) one embedding vector. Per-attachment derivation:

ModalityDerive stepTool
Voice/video audioTranscribe with timestampsDeepgram, AssemblyAI, Whisper via Groq
ImageCaption + entity detectionVision model (one call gets caption + entities)
HandwritingOCRVision model; Google Vision as fallback
ScreenshotOCR + UI contextVision model
EXIFPull location, timeexifr; reverse-geocode location to a place entity

Multimodal embedding model > caption-then-embed. Voyage Multimodal-3 or Cohere Embed v4 — image+text in one shared space, one model, one HNSW index. Half-day swap, big recall win.

Confidence weighting on edges:

  • Text mention of “Sarah” → high
  • Photo with face matching prior Sarahs → medium
  • Voice memo whose transcript mentions Sarah → high (transcript is text)
  • GPS in EXIF matching a place entity → automatic place edge

Photos and places give you nodes that have thumbnails — way better aesthetic than Obsidian’s dot-and-label.

Cost reality:

  • 5-min voice memo: ~$0.03 (transcription + extraction)
  • Single photo: ~$0.01 vision
  • Heavy user with 10 voice + 20 photos/day: ~$15/month API costs

Implications: debounce extraction (don’t re-extract on every edit), cache derivations, tier the model (Haiku 4.5 routine, escalate to Sonnet for ambiguous cases), batch photos in single vision calls.

Yes — already doing it. supabase/functions/summarize-step/index.ts:24-51 extracts tags from transcript text. The prompt explicitly says topics are “GRAPH EDGES, not search facets” and biases toward recurring nodes (people, projects, themes), 2–4 typical, 5 max. Stored on memories.topics.

The chat summary’s “AI extracts tags from text” advice isn’t new — it’s the existing pipeline. The question is what we do downstream of that.

Three gaps in ARCIVE today (not redesign, but small additions):

1. Topics aren’t edges yet — only embeddings are

compute-edges-step/index.ts:46-64 builds memory_edges purely from embedding similarity (match_memories RPC, top-8, ≥0.55 cosine). Two memories that both tag Daniel are only connected if their vectors happen to be close. The cheaper, sharper edge is the topic itself: shared topic = explicit edge, no LLM/vector cost.

2. No topic canonicalization

Daniel / daniel / Dan will become three nodes within a week. Today’s topics text[] column has no canonical identity. The minimum fix is a topics table with normalize-on-insert (lowercase + trigram match against existing labels for that user).

The right split: proper-case topics (Daniel, Q2 Roadmap) become inline highlights in the transcript viewer; lowercase topics (anxiety, public speaking) become chips above the transcript. The existing prompt almost gives this for free — case is already used to distinguish entities from themes.

Char offsets from the LLM are fragile. Just string-match topics against transcript at render time, case-insensitive word boundary, longest-match-first. That’s what most viewers do; cheap and robust.

Synthesis — the framing that became ADR-0012

The chat’s framing (“Obsidian aesthetic, Tana retrieval”) is useful as vocabulary but ARCIVE is structurally closer to Tana with a force-graph skin: typed entity nodes, AI-populated, queryable, with a graph view as one of several surfaces.

This is on-brand:

  • Calm, ambient, AI-does-the-work (2026-05-04_multimodal_expansion.md §“Philosophical reframe”) — user never types [[wikilinks]].
  • Auto-correlation, not user-initiated tagging is ARCIVE’s MOAT layer (ADR-0011 Layer 10).
  • Multimodal landings — vision/OCR-extracted topics from photos flow into the same memory_topics table, no parallel system.

What lands in ADR-0012

Ships now (V0.3 slice):

  1. topics + memory_topics schema with kind (person | place | project | theme | event) and an embedding column reusing the Voyage-3-lite 512-d space.
  2. Extend summarize-step prompt to return [{label, kind}].
  3. New link-topics-step pgmq function with hybrid pg_trgm + pgvector resolution.
  4. Extend compute-edges-step to write topic-shared edges with kind='topic'; existing edges become kind='semantic'.
  5. Render-time inline highlighting in the transcript viewer + chip row above.
  6. Universe view nodes are topics; memory↔memory edges become a secondary toggle.

Deferred (future feature work):

  1. Confidence weighting on memory↔topic links.
  2. EXIF → place topics, free.
  3. Diarized speaker → person topic attribution.
  4. Vision/OCR-extracted topics for images (lands with multimodal ingest).
  5. MCP retrieval over topics (memories_by_topic, related_topics).
  6. Topic merge/split UX.
  7. Cluster-around-entity zoom view.

Open decisions

  • Adopt ADR-0012 — accepted, shipped 2026-05-07 (PR #11).
  • When mobile Universe view ships, build against topic-nodes from day one (avoid rebuild). Shipped 2026-05-06 (feature/mobile-universe) — Skia + d3-force, edges/nodes already topic-aware via the β.1 schema, no retrofit needed.
  • Confirm Voyage Multimodal-3 vs Cohere Embed v4 for the multimodal swap (separate decision tracked in 2026-05-04_multimodal_expansion.md).

Lessons recorded

  • Read the prompts before proposing infrastructure. The conversation was heading toward “we should add topic extraction” — a half-day discovery in the existing code revealed we’d been doing it for weeks and just stranding the output. The actual ADR is “use what we already extract,” not “extract more.”
  • Obsidian-style wikilinks-in-prose is a file-format artifact, not a strategy. ARCIVE has no editable prose layer; that pattern is irrelevant. The vocabulary stays useful (entities-as-nodes, graph-as-surface) — the markup mechanics don’t.
  • The graph ≠ the value. Obsidian power users discover the force-graph stops being useful past ~500 nodes. The actual product win is retrieval (typed entity queries, MCP surfaces) with the graph as one rendering of the same data.