GPU-ACCELERATED & LOCAL-FIRST

Code faster than
you can type.

Privacy-first voice-to-text built for developers. Runs locally on your GPU. Recognizes code syntax and technical vocabulary.

Open source

* Supports Linux (Wayland/X11) & macOS (M-Series/Intel)

nvim — mojo-voice-demo.ts

The Old Way

  • Cloud-based transcription sends your private code to third-party servers.
  • High latency (>1.5s) breaks your flow state.
  • Dictating "kubectl get pods" results in "cube cuddle get pads".

The Mojo Voice Way

  • 100% Local. No data ever leaves your machine. Offline capable.
  • Lightning-fast transcription via CUDA/Metal GPU acceleration.
  • Optimized for technical terminology. It knows your stack.

Built for the modern stack.

Generic dictation tools fall apart when you switch to coding. Mojo Voice was engineered from the ground up for technical workflows.

Blazing Fast

Built with Rust and Mojo. Leveraging CUDA on Linux and Metal on macOS for sub-second inference.

Privacy First

Your voice data is processed entirely on your GPU. Zero telemetry. Zero cloud dependencies. Works completely offline.

Tech-Aware

Trained on documentation and codebases. Accurately transcribes camelCase, snake_case, and complex CLI commands.

31 Whisper Models

Choose from 31 pre-configured Whisper models with visual quality meters. Desktop app with transcription history. Available as AppImage, .deb, and .dmg.

Cross-Platform

Native support for Linux (Wayland & X11) and macOS. Windows support coming in v1.0. Unified experience everywhere.

Desktop Integrated

Seamlessly integrates with your environment. Includes a Waybar module to show real-time status.

Open Source

MIT Licensed. Audit the code, contribute features, and customize for your specific needs.

Full Control

Customize hotkeys, select audio devices, and configure models dynamically. Tailor every setting to match your workflow and hardware.

Resource Efficient

Minimal memory footprint. Run alongside your IDE, terminal, and browser without impacting system performance.

KEY DIFFERENTIATOR

Powered by Mojo Audio

At the core of Mojo Voice is our custom-built mojo-audio engine—a high-performance mel-spectrogram library written in Mojo that outperforms traditional implementations.

Built for speed and accuracy, mojo-audio delivers the foundation for lightning-fast speech recognition without compromising quality.

Performance10x faster
Accuracy99.9%

Custom-built for Whisper inference

"Now instead of typing expletives in all caps to my coding agent when I get frustrated, I can just shout into my mic. Mojo Voice doesn't skip a beat."

TC
Ted Cruz, TX Lisp Programming Expert

Latency Matters.

When coding, a delay of more than 1 second breaks your train of thought.

By running quantized models directly on your GPU, Mojo Voice achieves 5-10x speedups compared to CPU execution or cloud round-trips.

* Performance varies by hardware. Benchmarks shown are representative comparisons.

<1s
GPU Latency
Loading chart...
Relative performance comparison (lower is faster)

From voice to code in milliseconds

Trigger

Super + D

Capture

Audio buffer

Infer

Whisper (GPU)

Output

Text at cursor

Supported Platforms

Linux

Wayland & X11

CUDAROCm

macOS

Apple Silicon & Intel

Metal

Windows

Coming Soon

DirectML

Roadmap

v0.2.0

The Foundation

Candle engine, GPU acceleration, Cross-platform support.

v0.3.0

Project Rename

Rebranded to Mojo Voice with production polish.

v0.4.0

Desktop App

Glassmorphic Tauri UI, Settings panel, Dev Tools, Real-time dashboard.

v0.5.5

Model Management

Visual model manager with 5-dot quality meters, Transcription history, Desktop app builds, Audio device selection.

v0.6.0

Platform Maturity

Model setup wizard, Linux automated testing, Cross-platform indicator, macOS installer.

v0.7.0

Platform Expansion

AMD ROCm support, Windows support, Polybar integration.

v0.8.0

Intelligence

Context-aware vocabulary, Speculative decoding, Dynamic prompt biasing, Performance optimization.

v1.0.0

Production Ready

IDE plugins (VSCode, Neovim), Voice commands, Project vocabulary learning.

Quick Answers

Get answers to the most common questions about MojoVoice.

Does MojoVoice work offline?

Yes, MojoVoice works completely offline. All speech recognition processing happens locally on your GPU with zero cloud dependencies. Your voice data never leaves your machine.

Which platforms does MojoVoice support?

MojoVoice supports Linux (Wayland and X11) and macOS (Apple Silicon and Intel via Rosetta 2). Windows support is planned for v1.0. GPU acceleration works with CUDA on Linux and Metal on macOS.

Is MojoVoice free and open source?

Yes, MojoVoice is 100% free and open source under the MIT license. You can audit the code, contribute features, and customize it for your needs on GitHub.

Do I need a GPU to use MojoVoice?

While a GPU provides the best performance (sub-second transcription), MojoVoice also works on CPU-only systems. NVIDIA GPUs with CUDA and Apple Silicon Macs with Metal provide optimal speed.

Is my voice data sent to any servers?

No, absolutely not. MojoVoice processes all audio 100% locally on your GPU. There is zero telemetry, zero cloud connectivity, and zero data collection. Your privacy is guaranteed by design.

Ready to code at the speed of thought?

Open source, free for personal use, and privacy-first.

Requires NVIDIA GPU (Linux) or Apple Silicon (Mac) for best performance.