Voice-Driven Development

I don’t type most of my code anymore. I speak it.

Not dictation in the traditional sense—not “open parenthesis, import react, close parenthesis.” I’m having a conversation with Claude Code while my hands stay on the keyboard for the parts that actually need them.

Here’s the setup that makes this work.

The Stack

Superwhisper handles voice-to-text. Local processing, near-instant transcription, and it runs continuously in the background. One hotkey activates it, I talk, it transcribes directly into whatever app has focus.

Claude Code in the terminal receives those transcribed commands. “Refactor this to use dependency injection.” “Add error handling for the network timeout case.” “What’s that function doing on line 47?”

Obsidian captures the artifacts. Notes, plans, documentation—all flowing from the same voice-first workflow.

The magic is that these tools don’t know about each other. There’s no integration to configure. Superwhisper outputs text. Text goes wherever my cursor is. Simple.

Why Voice Works for Code

Typing is optimized for precision. Voice is optimized for intent.

When I’m debugging, I don’t want to carefully type out a question. I want to think out loud: “This is returning null but it shouldn’t be—the API call succeeded, I can see the response in the logs, so something’s happening between the fetch and the state update.”

That stream of consciousness gives Claude more context than a carefully edited query would. The messiness is a feature.

When I’m designing, speaking forces linear thinking. I can’t easily jump around and reorganize. This constraint helps me work through problems sequentially instead of getting lost in parallel considerations.

When I’m tired, voice keeps me productive. My brain works fine at 11pm—it’s my hands that don’t want to type. Voice bridges that gap.

The Workflow in Practice

I’m looking at code. Something needs to change.

I hit the hotkey. Superwhisper starts listening.

“Add input validation to this form. Email should be a valid format, password needs at least eight characters, and show inline errors below each field.”

I stop talking. Superwhisper transcribes. Claude Code receives the instruction and starts working.

While Claude generates the code, I can talk through the next thing: “After this, we should add a loading state for the submit button.”

It’s not faster than typing for simple commands. But for anything that requires explanation or context, voice wins decisively. And the mental overhead is lower—I’m thinking about the problem, not about how to express the problem.

Obsidian Integration

Voice works even better for notes than for code.

Obsidian has focus. I start talking. “Meeting notes from the design review. Three main decisions: we’re going with the card-based layout, the timeline feature is pushed to v2, and Sarah is owning the responsive breakpoints. Action items—I need to update the Figma with the new card dimensions, and we need to schedule the v2 planning session.”

That’s a complete meeting note captured in under thirty seconds. The alternative is twenty minutes of typing after the meeting while the context slowly fades from memory.

For daily journaling, weekly reviews, braindumps—anything where the goal is to capture thinking—voice is unmatched.

Superwhisper Specifically

There are other transcription tools. I’ve tried most of them. Superwhisper wins for a few reasons:

Local processing. Nothing goes to a server. For someone who talks through sensitive code all day, this matters.

Speed. The transcription appears as I’m finishing my sentence. There’s no workflow-breaking delay.

Always running. It’s not an app I launch. It’s infrastructure that’s just there when I need it.

Accuracy. Technical vocabulary, library names, variable names—it handles them. Not perfectly, but well enough that I rarely need to correct transcription errors.

What Doesn’t Work

Very precise edits. “Change line 47 from const to let” is faster to just type.

Heavy refactoring. When I need to think carefully about every character, voice adds overhead instead of removing it.

Loud environments. Coffee shops require typing.

The Meta-Point

This post was written by voice, into Claude Code, which formatted it for my blog.

I talked through the structure. I spoke each section. Claude cleaned up the filler words and organized my spoken thoughts into written paragraphs.

The tools don’t integrate in the traditional sense. They compose. Each does one thing well, and together they create a workflow that’s faster and more natural than any of them alone.

That’s the pattern I keep finding: don’t wait for tools to have official integrations. Find tools that work with text, and let text be the integration layer.