loading…
Search for a command to run...
loading…
An MCP server that exposes speech-to-text and text-to-speech capabilities using a local speaches instance, allowing AI assistants to transcribe audio and genera
An MCP server that exposes speech-to-text and text-to-speech capabilities using a local speaches instance, allowing AI assistants to transcribe audio and generate speech.
An MCP (Model Context Protocol) server that exposes speaches as transcribe_audio and text_to_speech tools.
Speaches is a local, OpenAI API-compatible server for speech-to-text (via faster-whisper) and text-to-speech (via Kokoro/Piper). This MCP server lets AI assistants like Claude use it directly.
transcribe_audioTranscribe an audio file using your speaches instance.
| Parameter | Required | Description |
|---|---|---|
file_path |
✅ | Absolute path to the audio file |
language |
❌ | ISO-639-1 language code (e.g. en). Omit for auto-detect. |
model |
❌ | Whisper model ID. Defaults to SPEACHES_STT_MODEL env var. |
text_to_speechConvert text to speech and save to a file.
| Parameter | Required | Description |
|---|---|---|
text |
✅ | Text to convert |
output_path |
✅ | Absolute path for the output audio file (e.g. /tmp/output.mp3) |
voice |
❌ | Voice ID. Defaults to TTS_VOICE env var. |
model |
❌ | TTS model ID. Defaults to TTS_MODEL env var. |
This exposes the MCP server over SSE on port 8010, suitable for remote clients.
docker compose up --build
Then connect your MCP client to http://localhost:8010/sse.
Build the image:
docker build -t speaches-mcp .
Run it:
docker run --rm -i \
-e SPEACHES_URL=http://your-speaches-host:8000 \
speaches-mcp
For Claude Desktop, add to your config:
{
"mcpServers": {
"speaches": {
"command": "docker",
"args": ["run", "--rm", "-i",
"-e", "SPEACHES_URL=http://your-speaches-host:8000",
"speaches-mcp"
]
}
}
}
| Variable | Default | Description |
|---|---|---|
SPEACHES_URL |
http://speaches:8000 |
Base URL of your speaches instance |
STT_MODEL |
Systran/faster-whisper-large-v3 |
Default speech-to-text model |
TTS_MODEL |
speaches-ai/Kokoro-82M-v1.0-ONNX |
Default text-to-speech model |
TTS_VOICE |
af_heart |
Default TTS voice |
OPENAI_API_KEY |
dummy |
Required by the OpenAI SDK but not used by speaches |
Before transcribing, make sure you've downloaded models into speaches:
# Speech-to-text
curl http://your-speaches-host:8000/v1/models/Systran/faster-whisper-large-v3 -X POST
# Text-to-speech
curl http://your-speaches-host:8000/v1/models/speaches-ai/Kokoro-82M-v1.0-ONNX -X POST
MIT
Run in your terminal:
claude mcp add speaches-mcp -- npx