Project memory from unstructured audio
About a year ago, I joined the ASR (automatic speech recognition) bandwagon and started building a local audio-to-notes pipeline for my own devlogs.
Voice capture, transcription, diarization, and summarization are all effectively solved problems at this point. There are literally dozens of tools that do one or more of these quite well. And yet…
The deeper I got into the space, the more I realized that I was rarely dissatisfied with transcription or even with the summaries.
I was dissatisfied with the actual usefulness of the resulting notes.
The product of transcription is a saturated solution of unstructured thought. People are messy by nature, and thoughts are free-flowing, so it’s unsurprising that reading a raw transcript is brutal.
The obvious next step is to hand the transcript to an LLM. I did this too, but there’s a core problem with the handoff. Every recording starts from zero. Every summary starts from zero.
Most systems are good at answering “What did I say today?” Few are able to answer “What barriers have been most impactful over the last few weeks or months?” This is getting better with Karpathy-style post-processing solutions but is weak or absent in the ASR space.
The core issue, at least for me, was a lack of coherency across the lifetime of a project.
I became interested in this concept of persistent customization, project-specific context, and multiple layers of time-abstraction. Audio becomes a transcript. Transcript becomes a summary. Summaries become weekly reviews. Weeklies become monthly rollups.
Instead of trying to extract everything from a single recording, the system gets multiple opportunities to identify recurring ideas, goals, blockers, decisions, and priorities.
What’s interesting is that this approach doesn’t require frontier models. A lot of value comes from structure and repetition, not the raw intelligence.
Nucleate is my attempt at a local-first project memory system that’s built around that idea.
It’s private by default, autonomous rather than chat based, and designed around long-term project tracking rather than individual recordings (albeit I built escape hatches for manual one-offs).
What was supposed to be a small side project ended up ballooning into an ~800h journey. The biggest thing that I learned was that ASR isn’t the hard part. Project memory is.
If you’d like to see what I ended up building, it’s here: Nucleate.
I’d genuinely appreciate feedback, criticism, or discussion from people who have experimented with ASR, local AI, PKM systems or project tracking. I’m particularly curious if others have run into the same issue of recordings feeling disconnected from one another over time.
Thanks for reading.