loading…
Search for a command to run...
loading…
A Windows-native MCP server that lets Claude Desktop transcribe audio files locally using whisper.cpp, with no internet connection required.
A Windows-native MCP server that lets Claude Desktop transcribe audio files locally using whisper.cpp, with no internet connection required.
A Windows-native MCP (Model Context Protocol) server that lets Claude Desktop transcribe audio and video files locally using whisper.cpp — with GPU acceleration, multilingual support, and batch processing. No internet connection required. No audio ever leaves your machine.
Why does this exist? The popular
whisper-mcppackage was built for macOS and assumes a Unix environment. It does not work on Windows. This package was written specifically for Windows users who want local AI transcription integrated with Claude Desktop.
Once installed, you can say things like this directly in Claude Desktop:
Download whisper-vulkan-win-x64.zip from the releases page.
This is a custom-compiled build with Vulkan GPU acceleration enabled. Works with AMD, NVIDIA, and Intel GPUs — no vendor-specific SDK required.
Extract to C:\whisper\Release\. You should end up with:
C:\whisper\Release\whisper-cli.exe
C:\whisper\Release\ggml-vulkan.dll
C:\whisper\Release\ggml.dll
C:\whisper\Release\ggml-base.dll
C:\whisper\Release\ggml-cpu.dll
C:\whisper\Release\whisper.dll
GPU acceleration is automatic — no additional configuration needed.
Requires: Git, CMake, Visual Studio Build Tools 2022+ with "Desktop development with C++", Vulkan SDK from lunarg.com.
git clone https://github.com/ggml-org/whisper.cpp
cd whisper.cpp
cmake -B build -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release --target whisper-cli
Copy the binaries from build\bin\Release\ to C:\whisper\Release\.
Note: The official whisper.cpp Windows releases on GitHub do not include a Vulkan build. You must use the pre-built release above or compile from source with
-DGGML_VULKAN=ON.
| Model | Size | Speed | Accuracy | Best for |
|---|---|---|---|---|
ggml-tiny.en.bin |
75 MB | Very fast | Basic | Quick tests |
ggml-base.en.bin |
142 MB | Fast | Good | Everyday English |
ggml-small.en.bin |
466 MB | Moderate | Better | Important recordings |
ggml-medium.en.bin |
1.5 GB | Fast on GPU | Very good | Best quality English |
ggml-large-v3-turbo.bin |
1.6 GB | Fast on GPU | Excellent | Recommended for English GPU batch work — ~6x faster than large-v3 with minimal accuracy loss |
ggml-large-v3.bin |
2.9 GB | Fast on GPU | Excellent | Multilingual, maximum accuracy |
ggml-medium.en-q5_0.bin |
514 MB | Fast | Very good | Best CPU-only English option — high accuracy at low memory |
ggml-large-v3-turbo-q5_0.bin |
547 MB | Fast | Excellent | Best CPU-only multilingual option |
ggml-large-v3-q5_0.bin |
1.1 GB | Moderate on CPU | Excellent | Multilingual, CPU-friendly |
Use download_model in Claude Desktop to install any of these directly. For English-only use: large-v3-turbo (GPU) or medium.en-q5_0 (CPU) are the best starting points. For multilingual use: large-v3-turbo or large-v3-turbo-q5_0 (CPU). English-only models (*.en.bin) output [FOREIGN] on non-English audio and cannot be used for other languages.
FFmpeg is required for video files and non-native audio formats.
Install via winget:
winget install ffmpeg
Or download from ffmpeg.org and add to your PATH.
Verify:
ffmpeg -version
npm install -g whisper-windows-mcp
Open Claude Desktop → Settings → Developer → Edit Config.
Add the whisper entry:
{
"mcpServers": {
"whisper": {
"command": "npx",
"args": ["-y", "whisper-windows-mcp"],
"env": {
"WHISPER_CLI_PATH": "C:\\whisper\\Release\\whisper-cli.exe",
"WHISPER_MODEL": "C:\\whisper\\models\\ggml-medium.en.bin"
}
}
}
}
Config file location: C:\Users\YourName\AppData\Roaming\Claude\claude_desktop_config.json
Use double backslashes in all paths.
Save and fully restart Claude Desktop. You should see whisper listed with a green running badge in Settings → Developer.
In Claude Desktop, ask:
"Check your whisper config"
Then:
"Check your system hardware"
This confirms your GPU is detected and Vulkan acceleration is active.
transcribe_audioTranscribe a single file. Supports blocking (default) or background mode for long files.
| Parameter | Description |
|---|---|
file_path |
Absolute path to the file (required) |
language |
Language code (en, ja, es, etc.) or auto to detect. Default: en |
output_format |
text (default), timestamps, json, or srt |
save_to_file |
Save transcript as .txt next to the source file |
background |
Run as detached job — returns a job ID immediately. Use check_progress to monitor. Recommended for files over 10 minutes. |
threads |
CPU thread override |
temperature |
Sampling temperature 0.0–1.0. Default 0.0 (deterministic). Higher values reduce hallucination on noisy audio. |
prompt |
Prior context string — improves accuracy for domain-specific vocabulary or speaker names. Example: "Names: Keemstar, DramaAlert." |
condition_on_prev_text |
Re-enable context conditioning between segments. Default false. |
beam_size |
Beam search width. Higher = more accurate, slower. Default 5. |
best_of |
Candidate sequences evaluated. Default 5. |
gpu_device |
GPU device index for multi-GPU systems. Default 0. |
processors |
Parallel processor count. Default 1. |
word_timestamps |
One word per timestamped segment. Useful for clip alignment. |
max_segment_length |
Max segment length in characters. |
diarize |
Stereo speaker diarization — requires stereo audio with speakers on separate channels. |
vad_model |
Path to Silero VAD model .bin. Strips silence before transcription — reduces hallucinations on noisy files. |
offset_t |
Start offset in milliseconds. |
duration |
Process duration in milliseconds from offset. |
check_progressMonitor a background transcription job started with transcribe_audio (background=true).
Returns elapsed time, last processed timestamp, percentage, and the full transcript when complete.
| Parameter | Description |
|---|---|
job_id |
Job ID returned by transcribe_audio |
start_batchAutomated sequential batch transcription of all untranscribed files in a folder. Sorts by duration (shortest first), processes one at a time as background jobs, validates each output.
| Parameter | Description |
|---|---|
folder_path |
Path to folder (required) |
language |
Language code. Default: en |
threads |
CPU thread override |
check_batch_progressMonitor a running batch. Automatically advances to the next file when the current one finishes. Returns overall progress, current file with timestamp, ETA, and any failed files.
| Parameter | Description |
|---|---|
batch_id |
Batch ID returned by start_batch |
transcribe_batch (interactive)Process files one at a time with a preview and confirmation before each. Useful when you want to review as you go.
| Parameter | Description |
|---|---|
folder_path |
Path to folder (required) |
file_index |
Which file to process (1-based). Omit to list files first. |
language |
Language code. Default: en |
recursive |
Include subfolders |
generate_subtitlesGenerate SRT subtitle files. Supports automatic language detection and English translation output.
| Parameter | Description |
|---|---|
file_path |
Path to file (required) |
language |
Language code or auto to detect. Default: en |
translate_to_english |
Also generate an English translation .en.srt. Only applies when source is not English. |
threads |
CPU thread override |
When both native and translation are requested, two files are saved next to the source:
filename.ja.srt — original languagefilename.en.srt — English translationWhisper's built-in translation only translates to English. For other target languages, translate the .srt file contents separately.
analyze_mediaAnalyze files before committing to transcription. Returns duration, size, codec, and estimated transcription time on CPU and GPU. For folders, shows all files in a sortable table with transcription status.
| Parameter | Description |
|---|---|
path |
Path to a single file or folder (required) |
sort_by |
For folders: duration (default), name, or size |
check_configVerify whisper-cli.exe, the model file, and FFmpeg are all accessible. Run this first if anything is failing.
list_modelsList all Whisper model files installed in your models directory. Shows filename, size, whether it is currently active, quantization status, and recommended use case. No network calls — reads local filesystem only.
download_modelDownload a Whisper model directly from Hugging Face into your models directory. Accepts a model name (e.g. large-v3-turbo, medium.en-q5_0) and handles the download automatically. Only downloads from trusted Hugging Face namespaces. After downloading, use switch_model to activate it.
| Parameter | Description |
|---|---|
model_name |
Model name to download, e.g. large-v3-turbo, large-v3-turbo-q5_0, medium.en-q5_0 |
switch_modelSwitch the active Whisper model for the current session without restarting Claude Desktop. Change is session-scoped — does not persist after restart. To make permanent, update WHISPER_MODEL in your config.
| Parameter | Description |
|---|---|
model_name |
Model filename (e.g. ggml-large-v3-turbo.bin) or full path. Must be a .bin file in the configured models directory. |
check_systemDetect GPU hardware and verify Vulkan acceleration is available. Reports GPU name, VRAM, whether ggml-vulkan.dll is present, and recommends the best model size for your hardware.
| Type | Formats |
|---|---|
| Native (no conversion) | mp3, wav |
| Video (auto-converted via FFmpeg) | mp4, mkv, avi, mov, webm, flv, wmv, m4v, ts, 3gp |
| Audio (auto-converted via FFmpeg) | m4a, ogg, flac |
The pre-built Vulkan release enables GPU acceleration automatically. Tested on AMD Radeon RX Vega 56 (GCN 5th gen). Any GPU with Vulkan 1.0+ support should work, including NVIDIA and Intel Arc.
Performance comparison (medium.en model, ~5 minute audio file):
| Hardware | Time |
|---|---|
| CPU only (Ryzen 7 2700x, 8 threads) | 8–12 minutes |
| GPU (Vega 56 via Vulkan) | 20–40 seconds |
GPU utilization during transcription is typically 15–20%, dropping back to idle between files. CPU stays around 15%.
Whisper can auto-detect the spoken language and transcribe in that language. The built-in translation model translates to English only.
For best multilingual accuracy, use the large-v3 model. English-specific models (*.en.bin) cannot detect or transcribe other languages.
Example — foreign language video with subtitles:
language=auto and translate_to_english=trueThis tool is built to minimize Claude API interactions. The entire transcription workflow — scan, analyze, queue, run, validate — is designed to require as few Claude interactions as possible. Heavy lifting is done locally on your machine.
| Variable | Description |
|---|---|
WHISPER_CLI_PATH |
Path to whisper-cli.exe (required) |
WHISPER_MODEL |
Path to model .bin file (required) |
WHISPER_THREADS |
CPU thread count override |
FFMPEG_PATH |
Path to ffmpeg if not in system PATH |
See TROUBLESHOOTING.md for detailed solutions.
Quick checklist:
C:\\whisper\\...)whisper-cli.exe exists at the configured path.bin file exists at the configured pathffmpeg -version works)whisper-windows-mcp is designed with security as a core principle.
All processing is local. No audio, transcripts, or file paths ever leave your machine. No telemetry. No cloud APIs required for core functionality.
Input validation. All file paths are validated before use — UNC paths (\\server\share) and directory traversal sequences (..) are rejected. Files over 10 GB are rejected to prevent resource exhaustion.
Transcript injection awareness. Audio files can contain spoken content that, when transcribed, resembles instructions. Claude's built-in defenses handle this, but it is worth knowing that transcript content is treated as data — never as instructions — by the MCP server itself.
Model downloads are restricted. The download_model tool only downloads from two trusted Hugging Face namespaces (ggerganov/whisper.cpp and ggml-org). Arbitrary URLs are rejected. Redirects are validated against an allowlist before following.
Model switching is sandboxed. switch_model only accepts .bin files within the configured models directory. Paths outside that directory are rejected.
No new network dependencies. Model downloads use Node.js built-in https — no external HTTP libraries are added to the package.
MIT
Pull requests welcome. See ROADMAP.md for planned features.
If you've tested GPU acceleration on hardware not listed above, please open an issue with your results — GPU model, VRAM, model size, and observed throughput.
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"whisper-windows-mcp": {
"command": "npx",
"args": []
}
}
}