Question 1

What is MojoVoice?

Accepted Answer

MojoVoice is a privacy-first voice dictation tool that runs 100% locally on your machine. It uses GPU acceleration for sub-second transcription and works completely offline with zero telemetry or cloud dependencies.

Question 2

How do I install MojoVoice?

Accepted Answer

Download from GitHub releases. On Linux with CUDA, extract and run the binary. On macOS, download the .dmg installer. Desktop app available as AppImage (Linux) and .dmg (macOS). See the documentation for detailed installation steps.

Question 3

Does MojoVoice work offline?

Accepted Answer

Yes, MojoVoice works completely offline. All speech recognition processing happens locally on your GPU with zero cloud dependencies. Your voice data never leaves your machine.

Question 4

Is MojoVoice free and open source?

Accepted Answer

Yes, MojoVoice is 100% free and open source under the MIT license. You can audit the code, contribute features, and customize it for your needs on GitHub.

Question 5

Which platforms does MojoVoice support?

Accepted Answer

MojoVoice supports Linux (Wayland and X11) and macOS (Apple Silicon and Intel via Rosetta 2). Windows support is planned for v1.0. GPU acceleration works with CUDA on Linux and Metal on macOS.

Question 6

Do I need a GPU to use MojoVoice?

Accepted Answer

While a GPU provides the best performance (sub-second transcription), MojoVoice also works on CPU-only systems. NVIDIA GPUs with CUDA and Apple Silicon Macs with Metal provide optimal speed.

Question 7

What are the system requirements?

Accepted Answer

MojoVoice requires Linux (Wayland/X11) or macOS. For GPU acceleration: NVIDIA GPU with CUDA on Linux or Apple Silicon/Intel Mac with Metal. Minimum 4GB RAM for smaller models, 8GB+ recommended for larger models.

Question 8

Can I use MojoVoice with my IDE or terminal?

Accepted Answer

Yes, MojoVoice integrates seamlessly with any application. Press your configured hotkey, speak, and transcribed text appears at your cursor in your IDE, terminal, browser, or any text input.

Question 9

How accurate is MojoVoice for technical terms?

Accepted Answer

MojoVoice uses OpenAI Whisper models trained on documentation and codebases. It accurately transcribes technical terms like camelCase, snake_case, Kubernetes, GraphQL, and complex CLI commands that other tools often miss.

Question 10

Which Whisper models can I use?

Accepted Answer

MojoVoice includes 31 pre-configured Whisper models ranging from tiny (75MB, fastest) to large-v3-turbo (1.5GB, most accurate). You can switch models dynamically to balance speed, accuracy, and VRAM usage for your hardware.

Question 11

Does MojoVoice support multiple languages?

Accepted Answer

Yes, Whisper models support 99+ languages. MojoVoice inherits this multilingual capability and can transcribe in any language supported by the Whisper model you've selected.

Question 12

Does MojoVoice have a desktop app?

Accepted Answer

Yes, MojoVoice includes a native desktop app with a glassmorphic UI for model management, transcription history, audio device selection, and real-time status monitoring. Available as AppImage, .deb, and .dmg.

Question 13

Can I customize hotkeys and settings?

Accepted Answer

Yes, MojoVoice offers full customization of hotkeys, audio device selection, model preferences, and performance settings. Tailor every aspect to match your workflow and hardware capabilities.

Question 14

How fast is voice transcription?

Accepted Answer

With GPU acceleration, MojoVoice achieves sub-second transcription latency. On an NVIDIA GPU with CUDA or Apple Silicon with Metal, typical 3-5 second voice clips transcribe in under 1 second, maintaining your flow state.

Question 15

How does the mojo-audio engine improve performance?

Accepted Answer

MojoVoice uses a custom mojo-audio engine written in Mojo for mel-spectrogram processing. This provides 10x faster performance compared to traditional implementations while maintaining 99.9% accuracy for Whisper inference.

Question 16

Is my voice data sent to any servers?

Accepted Answer

No, absolutely not. MojoVoice processes all audio 100% locally on your GPU. There is zero telemetry, zero cloud connectivity, and zero data collection. Your privacy is guaranteed by design.

Question 17

How does MojoVoice compare to cloud services like Google Speech?

Accepted Answer

Unlike cloud services, MojoVoice processes everything locally with zero latency from network round-trips, complete privacy with no data leaving your machine, and offline functionality. It's specifically optimized for developer workflows and technical vocabulary.

Question 18

What's the difference between MojoVoice and Talon Voice?

Accepted Answer

MojoVoice focuses on GPU-accelerated transcription with 31 Whisper models and a visual desktop app, while Talon Voice is primarily a voice command system. MojoVoice is MIT-licensed open source with zero cloud dependencies.

Question 19

Can I use MojoVoice for RSI or accessibility?

Accepted Answer

Yes, MojoVoice is excellent for repetitive strain injury (RSI) prevention and accessibility. Voice dictation reduces keyboard usage and enables hands-free development workflows for developers with mobility challenges.

Question 20

Where can I get support or report issues?

Accepted Answer

Join GitHub Discussions for community support, report bugs via GitHub Issues, or check the documentation at mojovoice.ai. The project is actively maintained with regular updates and responsive issue handling.

Frequently Asked Questions

Getting Started