loading…
Search for a command to run...
loading…
Unified Gemini media generation: Nano Banana (images, editing, multi-reference composition), Veo 3.1 (video, image-to-video, extend), TTS, and Lyria 3 (music wi
Unified Gemini media generation: Nano Banana (images, editing, multi-reference composition), Veo 3.1 (video, image-to-video, extend), TTS, and Lyria 3 (music with vocals). Single Go binary, 12 tools, supports Gemini API key and Vertex AI.
Go License gemini-media-mcp MCP server
Unified Go MCP server for AI media generation via Google Gemini API and Vertex AI.
# Install
go install github.com/mordor-forge/gemini-media-mcp/cmd/gemini-media-mcp@latest
# Configure (Gemini API; either variable name works)
export GEMINI_API_KEY="your-api-key"
# export GOOGLE_API_KEY="your-api-key"
# Or configure (Vertex AI)
export GOOGLE_CLOUD_PROJECT="your-project-id"
export GOOGLE_CLOUD_LOCATION="us-central1"
# Run directly (stdio transport)
gemini-media-mcp
Then add it to your MCP client -- see MCP Client Configuration below.
| Variable | Required | Default | Description |
|---|---|---|---|
GOOGLE_API_KEY |
Yes* | -- | Gemini API key. GEMINI_API_KEY is also accepted |
GOOGLE_CLOUD_PROJECT |
Yes* | -- | GCP project ID for Vertex AI backend |
GOOGLE_CLOUD_LOCATION |
No | us-central1 |
GCP region for Vertex AI |
MEDIA_OUTPUT_DIR |
No | ~/generated_media |
Directory for saved media files |
*One of GOOGLE_API_KEY or GOOGLE_CLOUD_PROJECT must be set. If both are set, API key takes precedence (avoids conflicts when GOOGLE_CLOUD_PROJECT is set in the shell for other tools).
If you're unsure which backend is active, call get_config from your MCP client to confirm the selected backend and output directory.
| Tool | Description | Type |
|---|---|---|
generate_image |
Generate image from text prompt | Sync |
edit_image |
Edit existing image with text prompt | Sync |
compose_images |
Multi-reference image composition (up to 3) | Sync |
generate_video |
Generate video from text prompt (returns operation ID) | Async |
animate_image |
Animate image into video (first frame) | Async |
extend_video |
Chain video clips for longer content | Async |
video_status |
Check video generation progress | Sync |
download_video |
Download completed video | Sync |
generate_audio |
Generate spoken audio from text (TTS) | Sync |
generate_music |
Generate AI music from text description (Lyria) | Sync |
list_models |
Show available models with capabilities and pricing | Sync |
get_config |
Show current backend and configuration | Sync |
Async tools return an operation ID immediately. Use video_status to poll for completion, then download_video to retrieve the file.
| Tier | Model | Best For | Cost |
|---|---|---|---|
| nb2 (default) | gemini-3.1-flash-image-preview |
Quick iterations, most tasks | ~$0.067/img |
| pro | gemini-3-pro-image-preview |
Final renders, complex scenes | ~$0.134/img |
Both tiers support resolutions 1K, 2K, 4K and aspect ratios 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9.
| Tier | Model | Best For | Cost |
|---|---|---|---|
| lite (default) | veo-3.1-lite-generate-preview |
High-volume, drafts | $0.05/sec (720p), $0.08/sec (1080p) |
| fast | veo-3.1-fast-generate-preview |
Good quality iterations | $0.15/sec (720p/1080p), $0.35/sec (4k) |
| standard | veo-3.1-generate-preview |
Final renders, 4K | $0.40/sec (720p/1080p), $0.60/sec (4k) |
Supported aspect ratios are 16:9 and 9:16. Supported durations are 4, 6, and 8 seconds. Lite supports 720p and 1080p. Fast and Standard support 720p, 1080p, and 4K. Video extension (extend_video) is only available on Fast and Standard tiers, and the extension tier must match the original generation.
| Tier | Model | Best For | Cost |
|---|---|---|---|
| tts | gemini-2.5-flash-preview-tts |
Text-to-speech with natural voices | Standard Gemini token pricing |
The generate_audio tool converts text to spoken audio. It supports:
Aoede, Kore, Puck, and more. Default: Aoedeen-US, it-IT, cs-CZ, de-DE). Default: en-USOutput is saved as raw PCM audio (audio/L16, 24kHz sample rate). The file can be played with tools like ffplay or converted to other formats:
# Play directly
ffplay -f s16le -ar 24000 -ac 1 ~/generated_media/audio-2026-04-02T12-20-12-0603.pcm
# Convert to WAV
ffmpeg -f s16le -ar 24000 -ac 1 -i audio.pcm audio.wav
# Convert to MP3
ffmpeg -f s16le -ar 24000 -ac 1 -i audio.pcm audio.mp3
| Tier | Model | Output | Best For | Cost |
|---|---|---|---|---|
| clip (default) | lyria-3-clip-preview |
30-second clips | Quick iterations, sound design | ~$0.08/song |
| full | lyria-3-pro-preview |
Up to ~3 minutes | Full songs with vocals, verses, choruses | Token-based |
The generate_music tool creates AI-generated music from text descriptions. Capabilities include:
[Verse], [Chorus], [Bridge], [Intro], [Outro][0:00 - 0:10] Intro: gentle piano... for precise section timingAll generated music is watermarked with SynthID.
Example prompts:
# Instrumental
"A gentle acoustic guitar melody in C major, 90 BPM, calm and peaceful indie folk"
# With structure
"[Intro] Ambient synth pad, ethereal
[Verse] Lo-fi hip-hop beat, mellow piano chords, vinyl crackle
[Chorus] Uplifting, add strings and gentle drums
[Outro] Fade out with reverb"
# With lyrics
"Upbeat pop song, 120 BPM, major key
[Chorus] We're dancing in the light / Everything feels right / Under stars so bright tonight"
You can pass the tier name (lite, fast, standard, nb2, pro, tts, clip, full) or a raw model ID directly.
Add to your Claude Code MCP settings (~/.claude/settings.json or project .mcp.json):
{
"mcpServers": {
"gemini-media": {
"command": "gemini-media-mcp",
"env": {
"GOOGLE_API_KEY": "your-api-key",
"MEDIA_OUTPUT_DIR": "/path/to/output"
}
}
}
}
Use either GOOGLE_API_KEY or GEMINI_API_KEY in the env block above; both are accepted.
Or if building from source:
{
"mcpServers": {
"gemini-media": {
"command": "/path/to/gemini-media-mcp",
"env": {
"GOOGLE_API_KEY": "your-api-key"
}
}
}
}
The skills/ directory contains Claude Code skills that provide interactive workflows on top of the MCP tools. Each skill guides Claude through prompt engineering, model selection, and iterative refinement for a specific media type.
| Skill | Directory | Description |
|---|---|---|
| gemini-image-gen | skills/gemini-image-gen/ |
Image generation, editing, and multi-reference composition |
| video-gen | skills/video-gen/ |
Video generation with async polling, image-to-video, extension |
| music-gen | skills/music-gen/ |
Music generation with structure tags, lyrics, genre control |
| tts-gen | skills/tts-gen/ |
Text-to-speech with voice and language selection |
To install a skill, copy its directory to ~/.claude/skills/:
cp -r skills/video-gen ~/.claude/skills/
cp -r skills/music-gen ~/.claude/skills/
cp -r skills/tts-gen ~/.claude/skills/
cp -r skills/gemini-image-gen ~/.claude/skills/
Skills are optional — the MCP tools work without them. But the skills add prompt engineering guidance, model tier recommendations, and interactive review workflows that significantly improve output quality.
git clone https://github.com/mordor-forge/gemini-media-mcp.git
cd gemini-media-mcp
go build ./cmd/gemini-media-mcp/
The binary will be created at ./gemini-media-mcp.
To run tests:
go test ./...
git checkout -b feature/your-feature)go test ./... and go vet ./...mainДобавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"mordor-forge-gemini-media-mcp": {
"command": "npx",
"args": []
}
}
}Transcripts, channel stats, search
AI image generation using various models.
Unified GPU inference API with 30 AI services (LLM, image gen, video, TTS, whisper, embeddings, reranking, OCR) as MCP tools. Pay-per-use via x402 USDC or API k
A powerful image generation tool using Google's Imagen 3.0 API through MCP. Generate high-quality images from text prompts with advanced photography, artistic,