Memory Ep. 3: a search box for the brain

In the first episode I gave Claude and me a shared brain: an Obsidian vault we both read and write. In the second one I made the brain feed itself, so it grows whether I remember to tend it or not.

Growing the brain is only half of it though. The other half is using it. And for a while, the using part was embarrassingly crude.

The caveman phase

At first Claude read the vault the obvious way. Grep for a word, open the notes that matched, dump them whole into the context window. It worked when the vault was small and I could pretend the whole thing fit in my head.

It stopped working as it grew. Two problems, both annoying.

The first is noise. A question about one decision would drag three unrelated notes into context because they happened to share a word. The window filled up with paragraphs nobody needed, and I was spending attention budget on them.

The second is worse, and it is the one that really bothered me: grep only finds the words you typed. If I wrote “car” in one note and “vehicle” in another, a search for “vehicle” silently misses the first. My notes are full of that, the same idea written ten different ways across months. The brain knew things it could not find.

Retrieval, not reading

What I actually wanted was to ask the vault a question and get back the handful of passages that answer it, by meaning, not by exact spelling. That is retrieval, and it is a well-trodden path, but I wanted to build it myself to understand it rather than import a black box.

The pieces:

Chunk by structure. Split each note at its ## headings, so a chunk is one coherent idea instead of an arbitrary slice. The markdown structure, and the [[links]], survive the cut.
Embed locally. Run each chunk through a local multilingual embedding model. No API call, because this is my personal vault and it is not leaving the machine.
Store the vectors. pgvector, on a local Postgres. A question gets embedded the same way, and the nearest chunks come back ranked by similarity.
Only re-embed what changed. A hash per file, so editing one note does not re-index the whole vault.

Now a search returns the five chunks closest in meaning to the question. Whole files never enter the picture, and “car” finally matches “vehicle.”

Plugging it into Claude Code

A retrieval system I have to call by hand is barely better than grep. I wanted Claude to reach for it on its own, mid-conversation, the moment it realises it needs something from the brain.

So I wrapped the whole thing as an MCP server with a single tool: search_vault(query, k). Local, over stdio, talking to the same Claude Code I use all day.

The change in feel is bigger than it sounds. Before, I was the one pasting context in. Now Claude asks the brain a question, gets back the relevant chunks with the id of the note each came from, and can tell me where something came from. It even separates “what the vault says” from “what is in the code in front of us,” which turns out to matter a lot once you trust it with real work.

And the context cost collapsed. Serving a few relevant chunks instead of loading documentation wholesale cut the tokens for these lookups by around 90%. The brain got bigger and cheaper to consult at the same time.

The thread I couldn’t let go

This worked. I used it every day. But something kept nagging.

Semantic search is smarter than grep, but it is still flat. It ranks notes by how close they sit to the question in vector space, and then it stops. It completely ignores the thing that makes my vault mine: the links. Every note points to others. A question about a person quietly touches the projects they work on, the decisions they made, the meetings where they showed up. None of that surfaces in a similarity score unless the words happen to line up.

I kept catching myself thinking: my own memory does not work by cosine similarity. When something reminds me of a thing, a whole web lights up around it, one association pulling the next. The links in my vault are exactly that web, sitting right there, and my fancy new retrieval was throwing them away.

So I started reading about how the part of the brain that does this actually works. That is where the next episode starts.