Choosing the Right AI Model for HyperVoice
HyperVoice ships with 11 AI speech recognition models ranging from 75 MB to 3.1 GB. Choosing the right one depends on your hardware, how much accuracy you need, and whether you’re dictating quick notes or important documents.
This guide breaks down every model and helps you pick the best one for your setup.
The Models at a Glance
| Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
| Tiny | 75 MB | Fastest | Basic | Quick notes, low-end hardware |
| Tiny English | 75 MB | Fastest | Basic+ | English-only, slightly more accurate than Tiny |
| Base | 142 MB | Very Fast | Good | Casual dictation, older hardware |
| Base English | 142 MB | Very Fast | Good+ | English-only everyday use |
| Small | 466 MB | Fast | Great | General-purpose dictation |
| Small English | 466 MB | Fast | Great+ | English-only, solid all-rounder |
| Medium | 1.5 GB | Moderate | Excellent | Professional use, good hardware |
| Medium English | 1.5 GB | Moderate | Excellent+ | English-only professional use |
| Large-v3 Turbo | 1.6 GB | Moderate | Excellent | Best speed-to-accuracy ratio |
| Large-v2 | 3.1 GB | Slow | Maximum | Legacy, maximum compatibility |
| Large-v3 | 3.1 GB | Slow | Maximum | Highest accuracy available |
Understanding the Tradeoffs
Speed vs Accuracy
Smaller models transcribe faster but make more mistakes. Larger models are more accurate but take longer and need more resources. The relationship isn’t linear — the jump from Tiny to Small is dramatic in accuracy, while the jump from Medium to Large is more subtle.
For most users, the sweet spot is somewhere between Small and Large-v3 Turbo.
Multilingual vs English-Only
Models without “English” in the name support dozens of languages. The English-only variants (Tiny English, Base English, Small English, Medium English) are tuned specifically for English and can be slightly more accurate and faster for English dictation.
If you only dictate in English, the English-only variant of your preferred model size is worth trying. If you dictate in multiple languages, use the multilingual version.
Note: Large-v2, Large-v3, and Large-v3 Turbo are multilingual only — there are no English-only variants at the Large size.
GPU vs CPU
With GPU acceleration (Vulkan), even the Large models transcribe in a few seconds. On CPU only, larger models can take noticeably longer. If you’re running without a GPU, stick to Small or below for a responsive experience.
VRAM Requirements
HyperVoice uses Vulkan for GPU acceleration, which works with NVIDIA, AMD, and Intel GPUs. Here’s roughly how much VRAM each model tier needs:
| Model Tier | VRAM Needed | Works With |
|---|---|---|
| Tiny / Base | ~200–400 MB | Any GPU, integrated graphics |
| Small | ~500–800 MB | Any dedicated GPU |
| Medium / Large-v3 Turbo | ~1.5–2 GB | Most dedicated GPUs (2 GB+) |
| Large-v2 / Large-v3 | ~3–4 GB | Mid-range GPUs (4 GB+) |
If your GPU doesn’t have enough VRAM, HyperVoice automatically falls back to CPU transcription. You’ll see this reflected in the GPU status indicator in the app.
Recommendations by Use Case
Casual Notes and Quick Messages
Recommended: Small or Small English
If you’re dictating Slack messages, quick notes, or short emails, the Small model gives you a great balance. Transcription is fast, accuracy is solid, and it runs comfortably on almost any hardware.
Professional Documents and Long Dictation
Recommended: Large-v3 Turbo
For anything where accuracy matters — reports, documentation, client emails — Large-v3 Turbo is the best choice. It delivers near-Large-v3 accuracy at roughly Medium speed. At 1.6 GB, it fits in the VRAM of most dedicated GPUs.
This is the model we recommend for most users with a dedicated GPU.
Maximum Accuracy, No Compromises
Recommended: Large-v3
If you have a GPU with 4+ GB of VRAM and want the absolute best transcription quality, Large-v3 is it. The accuracy difference over Large-v3 Turbo is small but measurable, especially with accented speech, technical jargon, or noisy environments.
Older or Low-End Hardware
Recommended: Base or Base English
If you’re on an older laptop or a machine without a dedicated GPU, Base is a safe choice. It’s fast on CPU, small enough to fit anywhere, and accurate enough for everyday dictation. Tiny works too but may struggle with complex sentences.
Multilingual Dictation
Recommended: Medium or Large-v3 Turbo
For dictating in languages other than English, larger models perform significantly better. Tiny and Base can be unreliable with non-English languages. Medium is the minimum we’d recommend for multilingual use, with Large-v3 Turbo being the ideal choice.
How to Switch Models
- Open HyperVoice and go to Settings > Processing
- You’ll see the model manager with all available models
- Download the model you want (this only needs to happen once)
- Select it as your active model
You can have multiple models downloaded and switch between them anytime. Models are stored locally and work fully offline after the initial download.
Our Recommendation
Start with Large-v3 Turbo. It’s the best all-around model for most hardware. If transcription feels slow, step down to Small. If you need maximum accuracy and have the VRAM, step up to Large-v3.
The beauty of local models is that switching is free — no API costs, no usage limits, and complete privacy. Experiment until you find the right fit for your workflow.
For a complete setup walkthrough, check out our getting started guide.