Getting Started with HyperVoice: Voice to Text in Any Windows App

HyperVoice turns your voice into text inside any Windows application — VS Code, Slack, Chrome, Notion, you name it. Press a hotkey, speak, and the transcribed text appears at your cursor. No browser tab, no copy-paste, no context switch.

This guide walks you through installation, your first dictation, recording and output modes, GPU acceleration, and AI-powered post-processing.

Installation

Download the installer from hypervoice.app. The download is a standard .exe installer, around 10 MB.
Run the installer. Windows SmartScreen may show a “Windows protected your PC” warning because HyperVoice is a new application. Click More info, then Run anyway.
Launch HyperVoice. The app starts minimized in your system tray (bottom-right of the taskbar).

On first launch you’ll be asked to enter your license key. Once activated, HyperVoice works fully offline — no internet connection required for local transcription.

Your First Dictation

Click the HyperVoice tray icon to open the window, or use the default hotkey Ctrl + Shift + Space.
When you see the recording indicator, start speaking naturally.
Press the hotkey again (or click the stop button) to stop recording.
HyperVoice transcribes your speech locally using AI and pastes the text at your cursor in whichever app was focused before you started recording.

The entire flow — hotkey, speak, stop, paste — takes just a second or two of overhead. Most of the time is your actual speech.

Recording and Output Modes

Recording Mode

HyperVoice supports two recording modes, configurable in Settings > General:

Toggle (default) — Press the hotkey once to start recording, press again to stop and transcribe.
Push to Talk — Hold the hotkey to record, release to transcribe. Great for quick bursts of dictation.

Output Mode

You can also choose how transcribed text is delivered:

Auto-paste at cursor (default) — HyperVoice automatically pastes the text into whichever app was focused when you started recording.
Copy to clipboard only — Text is copied to your clipboard for you to paste manually with Ctrl+V. Useful if you want to review before pasting.

Choosing an AI Model

HyperVoice includes 11 AI model sizes. On first launch you’ll be prompted to download one. You can manage models in Settings > Processing. For a detailed comparison, see our guide to choosing the right AI model.

Model	Size	Best For
Tiny / Base	75–142 MB	Quick notes, low-end hardware
Small	466 MB	Everyday dictation, good speed-accuracy balance
Medium	1.5 GB	High accuracy, works well on most GPUs
Large-v3 Turbo	1.6 GB	Best balance — near Large-v3 accuracy at Medium speed
Large-v2 / Large-v3	3.1 GB	Maximum accuracy, dedicated GPU recommended

Most models also have an English-only variant (Tiny English, Base English, etc.) that can be slightly faster and more accurate if you only dictate in English.

GPU Acceleration

HyperVoice uses Vulkan for GPU-accelerated transcription, which works with NVIDIA, AMD, and Intel GPUs. This is enabled by default — if a compatible GPU is detected, you’ll see a green “GPU” badge in the app.

GPU acceleration can cut transcription time by 3–5x compared to CPU-only mode. If you have a dedicated GPU with at least 2 GB of VRAM, you’ll get near-instant results even with larger models. All processing happens on your device — learn more about our privacy-first architecture.

To check or toggle GPU mode, go to Settings > Audio > GPU Acceleration.

AI Post-Processing

Raw transcriptions are good, but sometimes you want polished output — proper punctuation, cleaned-up grammar, or a specific format. That’s where processing modes come in.

Built-in Modes

HyperVoice includes seven built-in processing modes:

Clean Up — Fixes grammar, removes filler words (“um”, “uh”), adds punctuation.
Professional Email — Transforms your speech into a polished, professional email.
Chat Message — Converts to a casual but professional message for Teams or Slack.
Meeting Notes — Structures your dictation into organized notes with key points and action items.
Bullet Points — Distills your speech into concise bullet points.
Status Update — Formats your speech as a project or work status update.
Ticket / Issue — Turns your description into a structured bug report or task with summary, description, and steps.

Select a mode from the cards on the Record tab. When no mode is selected, your raw transcription is used as-is.

Setting Up a Provider

Processing modes require an AI provider. You have three options:

Bring Your Own Key (BYOK) — Use your own API key from your preferred provider. Available to all license tiers. Set it up in Settings > Processing.
HyperVoice Cloud — Our hosted processing service, available with a Pro subscription. No API key needed.
None — Skip post-processing entirely. Your raw transcription is pasted as-is.

Custom Modes

Want something specific? Create your own mode in Settings > Processing. Write a system prompt that describes how the AI should transform your text, and it shows up as a card on the Record tab alongside the built-in modes. For example:

“Rewrite as a Git commit message in conventional commit format”
“Convert to a Jira ticket with summary, description, and acceptance criteria”
“Translate to formal Japanese”
“Rewrite as a tweet under 280 characters”

Dictionary Replacements

HyperVoice also supports dictionary replacements — custom term substitutions that are applied to every transcription. This is useful for correcting words that the AI consistently gets wrong, expanding abbreviations, or enforcing specific spellings of names and jargon.

Configuring the Hotkey

The default hotkey is Ctrl + Shift + Space. You can change it in Settings > General to any key combination that works for your workflow. At least one modifier key (Ctrl, Alt, Shift) is required.

What’s Next

You’re all set to start dictating. Here are some tips to get the most out of HyperVoice:

Speak naturally. HyperVoice handles conversational speech well — you don’t need to speak slowly or robotically.
Start with Large-v3 Turbo. It offers the best balance of speed and accuracy for most hardware.
Try different processing modes. Each mode transforms the same speech into different outputs — experiment to find what saves you the most time.
Use Push to Talk for quick inputs. If you’re firing off short messages or commands, Push to Talk feels snappier than Toggle mode.
Keep the app in the tray. HyperVoice uses minimal resources when idle and is always ready when you need it.

Have questions or feedback? Visit your dashboard or reach out at support@hypervoice.app.