Choosing the Right AI Model for HyperVoice

HyperVoice ships with 11 AI speech recognition models ranging from 75 MB to 3.1 GB. Choosing the right one depends on your hardware, how much accuracy you need, and whether you’re dictating quick notes or important documents.

This guide breaks down every model and helps you pick the best one for your setup.

The Models at a Glance

Model	Size	Speed	Accuracy	Best For
Tiny	75 MB	Fastest	Basic	Quick notes, low-end hardware
Tiny English	75 MB	Fastest	Basic+	English-only, slightly more accurate than Tiny
Base	142 MB	Very Fast	Good	Casual dictation, older hardware
Base English	142 MB	Very Fast	Good+	English-only everyday use
Small	466 MB	Fast	Great	General-purpose dictation
Small English	466 MB	Fast	Great+	English-only, solid all-rounder
Medium	1.5 GB	Moderate	Excellent	Professional use, good hardware
Medium English	1.5 GB	Moderate	Excellent+	English-only professional use
Large-v3 Turbo	1.6 GB	Moderate	Excellent	Best speed-to-accuracy ratio
Large-v2	3.1 GB	Slow	Maximum	Legacy, maximum compatibility
Large-v3	3.1 GB	Slow	Maximum	Highest accuracy available

Understanding the Tradeoffs

Speed vs Accuracy

Smaller models transcribe faster but make more mistakes. Larger models are more accurate but take longer and need more resources. The relationship isn’t linear — the jump from Tiny to Small is dramatic in accuracy, while the jump from Medium to Large is more subtle.

For most users, the sweet spot is somewhere between Small and Large-v3 Turbo.

Multilingual vs English-Only

Models without “English” in the name support dozens of languages. The English-only variants (Tiny English, Base English, Small English, Medium English) are tuned specifically for English and can be slightly more accurate and faster for English dictation.

If you only dictate in English, the English-only variant of your preferred model size is worth trying. If you dictate in multiple languages, use the multilingual version.

Note: Large-v2, Large-v3, and Large-v3 Turbo are multilingual only — there are no English-only variants at the Large size.

GPU vs CPU

With GPU acceleration (Vulkan), even the Large models transcribe in a few seconds. On CPU only, larger models can take noticeably longer. If you’re running without a GPU, stick to Small or below for a responsive experience.

VRAM Requirements

HyperVoice uses Vulkan for GPU acceleration, which works with NVIDIA, AMD, and Intel GPUs. Here’s roughly how much VRAM each model tier needs:

Model Tier	VRAM Needed	Works With
Tiny / Base	~200–400 MB	Any GPU, integrated graphics
Small	~500–800 MB	Any dedicated GPU
Medium / Large-v3 Turbo	~1.5–2 GB	Most dedicated GPUs (2 GB+)
Large-v2 / Large-v3	~3–4 GB	Mid-range GPUs (4 GB+)

If your GPU doesn’t have enough VRAM, HyperVoice automatically falls back to CPU transcription. You’ll see this reflected in the GPU status indicator in the app.

Recommendations by Use Case

Casual Notes and Quick Messages

Recommended: Small or Small English

If you’re dictating Slack messages, quick notes, or short emails, the Small model gives you a great balance. Transcription is fast, accuracy is solid, and it runs comfortably on almost any hardware.

Professional Documents and Long Dictation

Recommended: Large-v3 Turbo

For anything where accuracy matters — reports, documentation, client emails — Large-v3 Turbo is the best choice. It delivers near-Large-v3 accuracy at roughly Medium speed. At 1.6 GB, it fits in the VRAM of most dedicated GPUs.

This is the model we recommend for most users with a dedicated GPU.

Maximum Accuracy, No Compromises

Recommended: Large-v3

If you have a GPU with 4+ GB of VRAM and want the absolute best transcription quality, Large-v3 is it. The accuracy difference over Large-v3 Turbo is small but measurable, especially with accented speech, technical jargon, or noisy environments.

Older or Low-End Hardware

Recommended: Base or Base English

If you’re on an older laptop or a machine without a dedicated GPU, Base is a safe choice. It’s fast on CPU, small enough to fit anywhere, and accurate enough for everyday dictation. Tiny works too but may struggle with complex sentences.

Multilingual Dictation

Recommended: Medium or Large-v3 Turbo

For dictating in languages other than English, larger models perform significantly better. Tiny and Base can be unreliable with non-English languages. Medium is the minimum we’d recommend for multilingual use, with Large-v3 Turbo being the ideal choice.

How to Switch Models

Open HyperVoice and go to Settings > Processing
You’ll see the model manager with all available models
Download the model you want (this only needs to happen once)
Select it as your active model

You can have multiple models downloaded and switch between them anytime. Models are stored locally and work fully offline after the initial download.

Our Recommendation

Start with Large-v3 Turbo. It’s the best all-around model for most hardware. If transcription feels slow, step down to Small. If you need maximum accuracy and have the VRAM, step up to Large-v3.

The beauty of local models is that switching is free — no API costs, no usage limits, and complete privacy. Experiment until you find the right fit for your workflow.

For a complete setup walkthrough, check out our getting started guide.