Why I still approve my memory by hand

Every conversation I have with an LLM gets written down. All of it, raw, into a daily log. That part is trivial. The hard part is what happens next: deciding which scraps of that stream are worth promoting into a real, durable note, and which are just noise the system should forget.

In my setup a cron job reads the log, an LLM proposes what looks worth keeping, and then I approve or reject each candidate by hand. It is rudimentary, and it works. But the manual step always nagged at me. It does not scale, and “I do it by hand” is the kind of thing you say right before someone asks why you haven’t automated it.

The gap I couldn’t close

Then I read Jeremy Daly’s piece on moving from RAG to memory systems. It maps almost one to one onto what I had built: typed memory, a raw trace you don’t retrieve from directly, and a promotion gate that decides what becomes durable.

It also names a state I had been circling without a word for it: provisional. A record that is written but kept out of retrieval until something confirms it. That quarantine is what stops a noisy pipeline from poisoning what the system can recall. Lovely idea. But the piece defines what enters as provisional and never says what promotes it to active. So I asked him.

Three ways to automate the decision

He answered, and the short version is: it depends on risk, privacy, and confidence.

Sensitive data (anything that could compromise a business process, or personal information): manual approval is still the best call.
Lightweight facts (a user preference, say): a simple consensus rule, promote once two or more independent observers implicitly confirm it.
A hybrid: an LLM-as-judge fact-checks the candidate, looks for consensus or third-party confirmation, and escalates to a human only when needed.

And one idea I keep turning over: ask for confirmation in conversation. When the summarizer extracts a candidate fact relevant to what you’re talking about, the system just surfaces it and asks, “looks like X is true, is that right?” No separate UI, confirmation falls out of the dialogue.

Why most of that doesn’t fit a system of one

Here is where it got interesting for my case, because two of those patterns quietly assume something I don’t have: independent observers.

In a single-user system, my “independent observers” are the same LLM across different sessions. That is not independence, it is an echo. Correlated bias dressed up as confirmation, which is the exact failure his own piece warns about with self-reported confidence. And the judge that checks a candidate against third-party sources has no anchor when I am the source of truth. The only thing it can compare my notes against is the rest of my notes. It ends up fact-checking the vault against itself, which is circular.

So the human stays, on purpose

Which flips the conclusion. The manual gate I had been apologising for is not the naive version I hadn’t gotten around to automating. For a personal corpus it is the honest answer. The confirmer has to be the one genuinely independent source of truth in the system, and that is me. Daly said it plainly for the sensitive case: manual is still best.

The conversational-confirmation idea is what makes that gate stop being friction. Instead of ticking checkboxes in a review pass, the system asks me in the flow, when the context is already on the table. That is exactly where a conversational front belongs, the one I want to build on top of this.

The lesson I keep coming back to: thinking hard about someone else’s design, one built for a far bigger problem than mine, was the fastest way to understand the shape of my own. I didn’t catch his architecture out. I found the one spot where my tiny single-user version parts ways from it, and why that’s the right call for something this small. And asking a sharp, specific question of someone who knows more than you is still the most underrated move there is.

There is a bigger thread I’ll save for another day, though. This was about one corner, the gate. Daly’s article is about how an entire memory system should work, and reading it made one thing plain: what I’ve built so far is the retrieval half of that map, not the whole thing. Where hipocampo should grow from here is its own post.