The Old Way
- →Cloud-based transcription sends your private code to third-party servers.
- →High latency (>1.5s) breaks your flow state.
- →Dictating "kubectl get pods" results in "cube cuddle get pads".
The Mojo Voice Way
- →100% Local. No data ever leaves your machine. Offline capable.
- →Lightning-fast transcription via CUDA/Metal GPU acceleration.
- →Optimized for technical terminology. It knows your stack.
Built for the modern stack.
Generic dictation tools fall apart when you switch to coding. Mojo Voice was engineered from the ground up for technical workflows.
Blazing Fast
Built with Rust and Mojo. Leveraging CUDA on Linux and Metal on macOS for sub-second inference.
Privacy First
Your voice data is processed entirely on your GPU. Zero telemetry. Zero cloud dependencies. Works completely offline.
Tech-Aware
Trained on documentation and codebases. Accurately transcribes camelCase, snake_case, and complex CLI commands.
31 Whisper Models
Choose from 31 pre-configured Whisper models with visual quality meters. Desktop app with transcription history. Available as AppImage, .deb, and .dmg.
Cross-Platform
Native support for Linux (Wayland & X11) and macOS. Windows support coming in v1.0. Unified experience everywhere.
Desktop Integrated
Seamlessly integrates with your environment. Includes a Waybar module to show real-time status.
Open Source
MIT Licensed. Audit the code, contribute features, and customize for your specific needs.
Full Control
Customize hotkeys, select audio devices, and configure models dynamically. Tailor every setting to match your workflow and hardware.
Resource Efficient
Minimal memory footprint. Run alongside your IDE, terminal, and browser without impacting system performance.
Powered by Mojo Audio
At the core of Mojo Voice is our custom-built mojo-audio engine—a high-performance mel-spectrogram library written in Mojo that outperforms traditional implementations.
Built for speed and accuracy, mojo-audio delivers the foundation for lightning-fast speech recognition without compromising quality.
Custom-built for Whisper inference
"Now instead of typing expletives in all caps to my coding agent when I get frustrated, I can just shout into my mic. Mojo Voice doesn't skip a beat."
Latency Matters.
When coding, a delay of more than 1 second breaks your train of thought.
By running quantized models directly on your GPU, Mojo Voice achieves 5-10x speedups compared to CPU execution or cloud round-trips.
* Performance varies by hardware. Benchmarks shown are representative comparisons.
From voice to code in milliseconds
Trigger
Super + D
Capture
Audio buffer
Infer
Whisper (GPU)
Output
Text at cursor
Supported Platforms
Linux
Wayland & X11
macOS
Apple Silicon & Intel
Windows
Coming Soon
Roadmap
The Foundation
Candle engine, GPU acceleration, Cross-platform support.
Project Rename
Rebranded to Mojo Voice with production polish.
Desktop App
Glassmorphic Tauri UI, Settings panel, Dev Tools, Real-time dashboard.
Model Management
Visual model manager with 5-dot quality meters, Transcription history, Desktop app builds, Audio device selection.
Platform Maturity
Model setup wizard, Linux automated testing, Cross-platform indicator, macOS installer.
Platform Expansion
AMD ROCm support, Windows support, Polybar integration.
Intelligence
Context-aware vocabulary, Speculative decoding, Dynamic prompt biasing, Performance optimization.
Production Ready
IDE plugins (VSCode, Neovim), Voice commands, Project vocabulary learning.
Quick Answers
Get answers to the most common questions about MojoVoice.
Does MojoVoice work offline?
Yes, MojoVoice works completely offline. All speech recognition processing happens locally on your GPU with zero cloud dependencies. Your voice data never leaves your machine.
Which platforms does MojoVoice support?
MojoVoice supports Linux (Wayland and X11) and macOS (Apple Silicon and Intel via Rosetta 2). Windows support is planned for v1.0. GPU acceleration works with CUDA on Linux and Metal on macOS.
Is MojoVoice free and open source?
Yes, MojoVoice is 100% free and open source under the MIT license. You can audit the code, contribute features, and customize it for your needs on GitHub.
Do I need a GPU to use MojoVoice?
While a GPU provides the best performance (sub-second transcription), MojoVoice also works on CPU-only systems. NVIDIA GPUs with CUDA and Apple Silicon Macs with Metal provide optimal speed.
Is my voice data sent to any servers?
No, absolutely not. MojoVoice processes all audio 100% locally on your GPU. There is zero telemetry, zero cloud connectivity, and zero data collection. Your privacy is guaranteed by design.