Choosing the Right AI Model for HyperVoice

· HyperVoice Team

HyperVoice ships with 11 AI speech recognition models ranging from 75 MB to 3.1 GB. Choosing the right one depends on your hardware, how much accuracy you need, and whether you’re dictating quick notes or important documents.

This guide breaks down every model and helps you pick the best one for your setup.

The Models at a Glance

ModelSizeSpeedAccuracyBest For
Tiny75 MBFastestBasicQuick notes, low-end hardware
Tiny English75 MBFastestBasic+English-only, slightly more accurate than Tiny
Base142 MBVery FastGoodCasual dictation, older hardware
Base English142 MBVery FastGood+English-only everyday use
Small466 MBFastGreatGeneral-purpose dictation
Small English466 MBFastGreat+English-only, solid all-rounder
Medium1.5 GBModerateExcellentProfessional use, good hardware
Medium English1.5 GBModerateExcellent+English-only professional use
Large-v3 Turbo1.6 GBModerateExcellentBest speed-to-accuracy ratio
Large-v23.1 GBSlowMaximumLegacy, maximum compatibility
Large-v33.1 GBSlowMaximumHighest accuracy available

Understanding the Tradeoffs

Speed vs Accuracy

Smaller models transcribe faster but make more mistakes. Larger models are more accurate but take longer and need more resources. The relationship isn’t linear — the jump from Tiny to Small is dramatic in accuracy, while the jump from Medium to Large is more subtle.

For most users, the sweet spot is somewhere between Small and Large-v3 Turbo.

Multilingual vs English-Only

Models without “English” in the name support dozens of languages. The English-only variants (Tiny English, Base English, Small English, Medium English) are tuned specifically for English and can be slightly more accurate and faster for English dictation.

If you only dictate in English, the English-only variant of your preferred model size is worth trying. If you dictate in multiple languages, use the multilingual version.

Note: Large-v2, Large-v3, and Large-v3 Turbo are multilingual only — there are no English-only variants at the Large size.

GPU vs CPU

With GPU acceleration (Vulkan), even the Large models transcribe in a few seconds. On CPU only, larger models can take noticeably longer. If you’re running without a GPU, stick to Small or below for a responsive experience.

VRAM Requirements

HyperVoice uses Vulkan for GPU acceleration, which works with NVIDIA, AMD, and Intel GPUs. Here’s roughly how much VRAM each model tier needs:

Model TierVRAM NeededWorks With
Tiny / Base~200–400 MBAny GPU, integrated graphics
Small~500–800 MBAny dedicated GPU
Medium / Large-v3 Turbo~1.5–2 GBMost dedicated GPUs (2 GB+)
Large-v2 / Large-v3~3–4 GBMid-range GPUs (4 GB+)

If your GPU doesn’t have enough VRAM, HyperVoice automatically falls back to CPU transcription. You’ll see this reflected in the GPU status indicator in the app.

Recommendations by Use Case

Casual Notes and Quick Messages

Recommended: Small or Small English

If you’re dictating Slack messages, quick notes, or short emails, the Small model gives you a great balance. Transcription is fast, accuracy is solid, and it runs comfortably on almost any hardware.

Professional Documents and Long Dictation

Recommended: Large-v3 Turbo

For anything where accuracy matters — reports, documentation, client emails — Large-v3 Turbo is the best choice. It delivers near-Large-v3 accuracy at roughly Medium speed. At 1.6 GB, it fits in the VRAM of most dedicated GPUs.

This is the model we recommend for most users with a dedicated GPU.

Maximum Accuracy, No Compromises

Recommended: Large-v3

If you have a GPU with 4+ GB of VRAM and want the absolute best transcription quality, Large-v3 is it. The accuracy difference over Large-v3 Turbo is small but measurable, especially with accented speech, technical jargon, or noisy environments.

Older or Low-End Hardware

Recommended: Base or Base English

If you’re on an older laptop or a machine without a dedicated GPU, Base is a safe choice. It’s fast on CPU, small enough to fit anywhere, and accurate enough for everyday dictation. Tiny works too but may struggle with complex sentences.

Multilingual Dictation

Recommended: Medium or Large-v3 Turbo

For dictating in languages other than English, larger models perform significantly better. Tiny and Base can be unreliable with non-English languages. Medium is the minimum we’d recommend for multilingual use, with Large-v3 Turbo being the ideal choice.

How to Switch Models

  1. Open HyperVoice and go to Settings > Processing
  2. You’ll see the model manager with all available models
  3. Download the model you want (this only needs to happen once)
  4. Select it as your active model

You can have multiple models downloaded and switch between them anytime. Models are stored locally and work fully offline after the initial download.

Our Recommendation

Start with Large-v3 Turbo. It’s the best all-around model for most hardware. If transcription feels slow, step down to Small. If you need maximum accuracy and have the VRAM, step up to Large-v3.

The beauty of local models is that switching is free — no API costs, no usage limits, and complete privacy. Experiment until you find the right fit for your workflow.

For a complete setup walkthrough, check out our getting started guide.

← Back to all posts