loading…
Search for a command to run...
loading…
Enables voice cloning and speech synthesis through Alibaba Cloud's Bailian and DashScope platforms. It provides tools to create, manage, and synthesize audio us
Enables voice cloning and speech synthesis through Alibaba Cloud's Bailian and DashScope platforms. It provides tools to create, manage, and synthesize audio using custom cloned voice profiles.
一个可部署到阿里云 Function AI 的 stdio MCP,用于:
把这个目录推到 GitHub 或阿里云 Codeup:
server.pyrequirements.txt.env.exampleREADME.mdMCP 服务SSEPython建议值:
pip install -t . -r requirements.txtpython server.py资源建议:
极速模式11在 Function AI 的变量管理里新增:
DASHSCOPE_API_KEYDASHSCOPE_REGION=cn-beijingBAILIAN_TTS_MODEL=cosyvoice-v3.5-plusINLINE_AUDIO_BASE64_LIMIT=300000部署成功后,Function AI 会给你一个公网 SSE 地址,通常是:
https://xxxx.cn-beijing.fcapp.run/sse
先在 Function AI 控制台直接测试工具是否可用。
+创建 MCP 服务使用脚本部署http配置示例:
{
"mcpServers": {
"voice-clone-mcp": {
"url": "https://xxxx.cn-beijing.fcapp.run/sse"
}
}
}
create_voice_clonewait_for_voice_readyOK 后,调 synthesize_with_cloned_voice{
"audio_url": "https://your-public-audio-url/sample.wav",
"prefix": "myvoice01",
"language_hint": "zh",
"target_model": "cosyvoice-v3.5-plus",
"region": "cn-beijing"
}
{
"text": "你好,这是一段使用复刻音色生成的演示语音。",
"voice_id": "cosyvoice-v3.5-plus-myvoice01-xxxxxxxx",
"target_model": "cosyvoice-v3.5-plus",
"region": "cn-beijing",
"inline_base64": true
}
target_model 必须一致,否则合成会失败audio_url 必须公网可访问prefix 建议只用小写字母、数字、下划线,长度不超过 10synthesize_with_cloned_voice 默认会把音频落到临时目录;在云端想长期保存,下一步建议接 OSSThe MCP now supports two additional tools for local recordings:
create_qwen_voice_clone_from_audio_base64create_qwen_voice_clone_from_local_filecreate_qwen_voice_clone_from_video_url_segmentcreate_qwen_voice_clone_from_local_video_segmentHow to choose:
create_qwen_voice_clone_from_audio_base64.
This is the remote-friendly path because you can pass audio as base64 or a full Data URL.stdio, use create_qwen_voice_clone_from_local_file.Important:
CosyVoice clone tools still require a public audio_url.Qwen3 TTS VC, because the official
Qwen voice enrollment API supports audio.data while the CosyVoice clone API is documented
around public URL input.Example for remote base64 mode:
{
"audio_base64_or_data_url": "data:audio/wav;base64,AAA...",
"preferred_name": "demo_voice_01",
"audio_mime_type": "audio/wav",
"target_model": "qwen3-tts-vc-2026-01-22",
"region": "cn-beijing"
}
Example for local file mode:
{
"local_file_path": "C:\\Users\\29932\\Desktop\\sample.wav",
"preferred_name": "demo_voice_01",
"target_model": "qwen3-tts-vc-2026-01-22",
"region": "cn-beijing"
}
Example for video URL mode:
{
"video_url": "https://your-public-video-url/demo.mp4",
"preferred_name": "demo_voice_01",
"start_time": "00:01:15",
"end_time": "00:01:42",
"speech_enhancement": false,
"target_model": "qwen3-tts-vc-2026-01-22",
"region": "cn-beijing"
}
Example for local video mode:
{
"local_video_path": "C:\\Users\\29932\\Desktop\\demo.mp4",
"preferred_name": "demo_voice_01",
"start_time": "75",
"end_time": "102",
"speech_enhancement": false,
"target_model": "qwen3-tts-vc-2026-01-22",
"region": "cn-beijing"
}
Video notes:
start_time and end_time support seconds or HH:MM:SS[.ms]24000 Hz mono WAV by default to preserve more timbre detail for cloningspeech_enhancement=false is now the safer default when similarity matters mostspeech_enhancement=true on only when the source clip is noisy enough that intelligibility matters more than timbre fidelity10-20s segment where the target speaker is clear, continuous, and background music is as weak as possibleUse different follow-up steps for the two API families in this MCP:
create_voice_clone:
This is the CosyVoice voice-enrollment flow. It is asynchronous.
After creation, call wait_for_voice_ready or query_voice, then call
synthesize_with_cloned_voice.
create_qwen_voice_clone_from_audio_base64
create_qwen_voice_clone_from_local_file
create_qwen_voice_clone_from_video_url_segment
create_qwen_voice_clone_from_local_video_segment
These are Qwen voice clone flows. They are ready for synthesis immediately
after the create call returns success.
Do not call query_voice, wait_for_voice_ready, list_voices, or
delete_voice with a Qwen voice id such as qwen-tts-vc-....
Call synthesize_with_cloned_voice directly with the returned voice_id
and target_model.
Qwen follow-up example:
{
"text": "时光如白驹过隙,转瞬即逝。",
"voice_id": "qwen-tts-vc-demo_voice_01-voice-20260323xxxx",
"target_model": "qwen3-tts-vc-2026-01-22",
"region": "cn-beijing",
"inline_base64": true
}
LobeHub expects Streamable HTTP, not SSE.
This project now supports both transports:
MCP_TRANSPORT=stdio
For local stdio use or Function AI MCP proxy mode.MCP_TRANSPORT=streamable-http
For direct LobeHub integration.Recommended environment variables for direct LobeHub deployment:
MCP_TRANSPORT=streamable-http
MCP_HOST=0.0.0.0
MCP_PORT=8080
Startup command for HTTP mode:
python server.py
LobeHub example config:
{
"mcpServers": {
"voice-clone-mcp": {
"url": "https://your-domain.example.com/mcp",
"type": "streamable-http",
"headers": {
"Authorization": "Bearer YOUR_TOKEN"
}
}
}
}
Important:
/mcp HTTP URL of the deployed service, not the old /sse URL.8080.Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"bailian-voice-clone-mcp": {
"command": "npx",
"args": []
}
}
}Transcripts, channel stats, search
AI image generation using various models.
Unified GPU inference API with 30 AI services (LLM, image gen, video, TTS, whisper, embeddings, reranking, OCR) as MCP tools. Pay-per-use via x402 USDC or API k
A powerful image generation tool using Google's Imagen 3.0 API through MCP. Generate high-quality images from text prompts with advanced photography, artistic,