Memory Ep. 5: the graph that lost to flat search

Last episode I made a bet: don’t infer a knowledge graph from text like everyone else, just use the one I had already drawn by hand with [[links]]. Treat the links as synapses, let a question spread activation across them, and let the structure do what a similarity score can’t.

This is the episode where I built that, and where it humbled me.

Building it

Turning the vault into a graph was the easy part. Nodes are notes, edges are wikilinks, forward links and backlinks, weighted. A few hundred lines and I had a directed graph of my own brain on screen.

The recall mechanism was the fun part. “Activation spreads from a cue” has a very concrete implementation: Personalized PageRank. Same idea as the algorithm that ranked the early web, but the random walk is biased to start from the notes most relevant to your question. Vector search still picks the entry points; from there the walk spreads along the links, and notes that are tightly connected to those entry points score high even if their words never matched the question.

And the demos were magic. Ask about a topic and notes would surface that shared almost no vocabulary with the query, pulled in purely because my past self had linked them. Multi-hop association, for free, out of structure I had authored months earlier. I was sure I had something.

The part I’d been postponing

Demos lie, though. A cherry-picked query that lights up beautifully tells you nothing about the average case. I needed to know if the graph was actually better than the flat vector search from Episode 3, not just cooler to watch.

So I did the boring thing I had been avoiding: I built an eval set. A list of real questions, each tagged with the notes that genuinely answer it (the ground truth), and a metric, did the right notes land in the top results? Then I ran both methods over the same questions: graph with PageRank versus plain vector similarity.

For the first time I had a number instead of a feeling.

The result

The graph lost.

On my own vault, flat vector similarity matched or beat the graph across the eval set. The elegant, biological, hand-authored-structure idea I had fallen in love with did not earn its place. Plain “compare the meaning of the question to the meaning of the notes” was as good or better, and far simpler.

That stung more than I want to admit.

Why it lost

Once I stopped sulking and looked at the failures, the reasons were not mysterious:

Not every note is a clean neuron. A focused note links cleanly. But a daily log is a bag: one day’s entry might cover a bug at work, a phone call, and something I read at lunch, all in the same note, all linked outward. Seed the walk anywhere near it and activation pours through that bag into completely unrelated clusters. The graph propagated noise as eagerly as signal.
My structure was uneven. Some corners of the vault are densely linked, others barely at all. PageRank rewards connectivity, so well-connected hubs floated to the top of every query whether or not they had anything to do with it.
Flat search is a strong baseline. Modern embeddings are genuinely good at “what does this mean.” That is a hard bar, and the graph was adding cost, complexity and a second failure mode to clear it, only to come up level at best.

What I actually took from it

The lesson was not “graphs are bad.” Graphs are great when the structure is clean and uniform; mine wasn’t.

The real lesson was about me. I had fallen for an idea because it was elegant and had a great story, and I was one weekend away from building the whole product on top of it, on vibes. The thing that saved me was the eval set, the least glamorous part of the whole effort. Building the measurement turned out to be more valuable than building the graph, because it let me kill my favourite idea with a clear conscience instead of defending it for months.

I now treat that as the actual skill: not having clever ideas, but being willing to put a number on them and believe the number.

I wasn’t quite ready to bury it, though. “Pure graph versus pure vectors” is a false choice, and losing one round is not the same as being useless. What if the two were never meant to compete, but to cooperate, vectors to find where to look, the graph to decide what comes along?

That hybrid is the next episode.

Building it

The part I’d been postponing

The result

Why it lost

What I actually took from it

Next