[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jacobcole / Vision Convo: Jacob & David / wiki/practices/voice-first-thought-capture.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- confidence: high related: - wiki/concepts/sparks-of-motivation.md - wiki/projects/ideaflow.md - wiki/themes/jacob-origin-story.md sources: - raw/transcript.md title: Voice-First Thought Capture type: practice visibility: public --- # Voice-First Thought Capture The user-interface modality Jacob designed his life around. The seed of the [[IdeaFlow]] product family. ## Origin From Jacob's [[Jacob's Origin Story|RSI injury]]: > "[I started] designing voice recognition systems. And fortunately, I got into MIT and a few other schools I was excited about, and I decided to go to MIT, and, yeah, I worked on voice recognition interfaces for a couple years. And it was all about, how can you build a hands-free interface with the lowest friction possible to capture your thoughts — because I had extra friction. But it turns out that I built UI paradigms that are just better for everybody, not just better for [the injured]." The accessibility origin → general-utility insight. Hands-free turned out to be **better for everyone**, because the keyboard isn't optimal for thought-capture even for the able-bodied. Typing imposes **conceptual structuring overhead** at the moment of capture, when expansive raw flow is what you want. ## The current tools Jacob mentions: - **Whisper** — uses extensively. "I think that is, other than maybe the frontier labs, probably one of the top competitors for [voice transcription]." - **Willow** — alternative. "Willow's a lot faster, but yeah, I guess Whisper's more alternative. I don't know. I think it's fine. I think their product is pretty replaceable." His view on the current state of voice tools: > "It's pretty good. I'd say it's not that much better than any of the alternatives, though." So: tools are usable but not yet differentiated. Jacob sees room for the next generation. ## David's pushback David flagged the trade-off: > "The counter-argument to that is, actually articulating your thoughts into words requires a certain level [of structure]." That is: speaking forces you to formulate complete sentences, which is itself a useful clarifying constraint. Pure raw thought capture might lose that. ## Jacob's response: both axes > "I think it's a mixed bag. There's some level of pressure that is nice to apply to congeal it, but it's also nice to be as expansive as possible. So if I could dream into a box, I think that would probably have some value." The synthesis: there are **two valuable modes**, and the best system would let you do both: - **Discriminating** (forced articulation, words, even etching-on-stone-tablets pressure) → refines, congeals - **Expansive** (raw thought, ambient capture, dream-flow) → preserves variety and surprise Different parts of the creative process want different modes. Capturing in voice gives you something between typing (high discrimination) and pure thought-stream (full expansion). ## The "etch on stone" extension > "Also nice if I have to etch it on a stone tablet, and like, it's not merely that I have to say it or [put it] into words, but it's like, I gotta really decide what I have to say. It's also valuable, but both sides are valuable, yeah, both the discriminating and the expansive." The stone-tablet pressure is a **third** mode: extreme discrimination. Useful for crystallizing. So really three modes: 1. Stone tablet (max discrimination, expensive) 2. Speech (medium discrimination, low cost) 3. Pure thought / dream-cap (max expansion, currently impossible) ## The dream-cap speculation David asked: > "Do you ever envision a world where, instead of voice-first, it's as soon as you think, it gets recorded and stored?" Jacob: "Could be very interesting." Then: > "Maybe there's something that we are not consciously aware of that is, like, intrinsic value of just the raw thought without any processing, where if you collect all those with a powerful enough LLM or some kind of model, if you have a dream cap, and just like dream all the archetypes go into pure form." David named the device: a **dream cap** — non-invasive thought recording. Jacob: "That would require Neuralink. Maybe not. They have this new cap that can read your thoughts, not invasively." And: "Yeah, I'd definitely try one." `confidence: speculative` for the dream-cap. The voice-first mode is real and current; the dream-cap mode is speculative tech. ## What "intrinsic value of just the raw thought" might mean The interesting hypothesis Jacob and David circle around: **pre-articulated thought may carry information that articulation destroys**. The verbal layer adds structure but also subtracts texture. A sufficiently rich capture (with a sufficiently good LLM downstream) might preserve patterns that are invisible at the verbal layer. This is a real claim. Whether it holds depends on whether the lossy compression of verbalization removes signal or just removes redundancy. ## Connection to sparks The whole [[Sparks of Motivation]] framework hinges on capture. If a spark goes uncaptured, it dissipates. If it's captured awkwardly (in a way that requires too much formulation effort), the act of capture changes the spark before it's recorded. **Voice-first capture is a deliberate engineering choice to minimize the perturbation of capture.** ## Related - [[Sparks of Motivation]] — what's being captured - [[IdeaFlow]] — the product - [[Jacob's Origin Story]] — why this matters to Jacob personally - [[Dream-Cap Thought Recording]] *(speculative)* — the next frontier