loading…
Search for a command to run...
loading…
AI image and video generation, editing, and region repair via Gemini, OpenAI, and Grok
AI image and video generation, editing, and region repair via Gemini, OpenAI, and Grok
MCP server for AI image & video generation, editing, and transplant-grade region repair
Powered by Gemini 3.1 Flash Image, OpenAI GPT Image 2, Grok Imagine, and Veo 3
An MCP server that gives Claude (or any MCP client) the ability to generate images, edit them, fix garbled text, and create videos — all through natural language.
pixel-surgeon-mcp is a multi-provider image generation server. You can use any combination of providers and switch between them per-request:
Google's image generation pipeline uses a two-stage approach: Gemini 3.1 Pro reasons about your prompt, then Gemini 3.1 Flash Image renders the pixels. Supports 9 aspect ratios at 512/1K/2K/4K resolution. Best price/performance ratio, with a free tier available.
OpenAI's latest image model with dramatically improved text rendering and visual fidelity. Supports flexible resolutions — pixel-surgeon maps your chosen size and aspect ratio to the optimal pixel dimensions automatically. Quality levels: medium (fast) and high (print-ready). Excellent for infographics, diagrams, and text-heavy images where other models struggle. Slower and more expensive.
xAI's Aurora-powered image model. Fastest generation speed and lowest cost. Supports 7 aspect ratios at fixed resolutions (~1K). Good for rapid prototyping and iteration.
For video, the server calls Veo 3 with async polling — generating both video and ambient audio. Supports 16:9 and 9:16 at 5s or 8s duration.
AI image models struggle with text-heavy images. The fix tools solve this by sending smaller regions to the provider, then stitching the results back with histogram-matched compositing for seamless blending.
| Tool | Description |
|---|---|
generate_image |
Text-to-image generation (single image) |
generate_images |
Parallel batch generation (1-8 images) |
generate_video |
Text-to-video via Veo 3 with audio (5s or 8s) |
edit_image |
Edit an existing image with natural language instructions |
fix_image |
Grid-based tile repair for garbled text (2x2, 3x3, etc.) |
fix_region |
Targeted region repair with automatic aspect ratio snapping |
interactive_fix |
Browser-based crop UI with multi-shot selection |
list_images |
List generated images and videos |
save_image |
Import an external image into the workspace |
remove_background |
Remove image background (alpha channel transparency) |
| Model | Provider | Resolution | Best for |
|---|---|---|---|
gemini-3.1-flash-image |
512 / 1K / 2K / 4K | General image generation, photo-realistic scenes | |
gemini-2.5-flash-image |
1K max (free tier) | Quick drafts, prototyping | |
gpt-image-2 |
OpenAI | Flexible (up to 4K) | Text-heavy images, infographics, diagrams, typography |
gpt-image-1 |
OpenAI | 3 fixed sizes | Legacy support |
grok-imagine |
xAI | Fixed (~1K per ratio) | Fast iteration, lowest cost |
Force a specific model per-call via the model tool parameter, or set DEFAULT_IMAGE_MODEL env var.
If a Gemini generation call fails with a billing / prepay error, the server automatically retries on the free-tier gemini-2.5-flash-image model. The viewer shows a yellow banner when this happens. Free-tier limits: 1K max resolution, 10 RPM, 500 RPD.
All generation and edit tools support an optional style parameter:
neo-brutalistMagazine editorial, bold typography, halftone textures. Cream, black, and terracotta palette.

duval-software-infographicDuval Software's signature retro-futurist infographic style. 1960s Space Age meets 1980s arcade. Cathode blue, amber, and salmon palette. Great for diagrams and system overviews.

fractal-arcadeDithered fractals, Sierpinski patterns, low-poly. CRT retro, Amiga/EGA palette.

clean-tech-infographicTechnical diagrams, system flows, data pipelines. Dark navy, cyan, and electric blue.

You need at least one provider API key. You can use any combination for maximum flexibility.
Prepayment required. Gemini 3.1 Flash Image and Veo 3 require billing and prepaid credits. The free-tier fallback (2.5 Flash) has limited resolution and rate limits. See Google AI pricing.
GPT Image 2 excels at text rendering, infographics, and diagrams. If you primarily need text-heavy images, this is the provider to use.
Grok Imagine is the fastest and cheapest provider. Great for rapid iteration and prototyping. Fixed output resolutions (~1K) with no size control.
No install needed — run directly with npx. Pass whichever API keys you have:
npx pixel-surgeon-mcp
claude mcp add pixel-surgeon \
-e GOOGLE_API_KEY=your-google-key \
-e OPENAI_API_KEY=your-openai-key \
-e XAI_API_KEY=your-xai-key \
-- npx pixel-surgeon-mcp
{
"mcpServers": {
"pixel-surgeon": {
"command": "npx",
"args": ["pixel-surgeon-mcp"],
"env": {
"GOOGLE_API_KEY": "your-google-api-key",
"OPENAI_API_KEY": "your-openai-api-key",
"XAI_API_KEY": "your-xai-api-key"
}
}
}
}
If you prefer a local clone:
git clone https://github.com/j-east/pixel-surgeon-mcp.git
cd pixel-surgeon-mcp
npm install
npm run build
Generated images are saved to ~/Pictures/pixel-surgeon/. A local browser viewer auto-launches on first use for full-resolution previews with model selection, respin controls, and search.
npm run dev # tsx watch mode
npm run build # compile TypeScript
npm run start # run compiled server
interactive_fix opens a browser crop UI, blocks via Promise until the user submits, fires parallel Gemini calls, and lets the user pick the best resultPRs are welcome! We're especially looking for:
Add entries to the STYLE_PRESETS object in src/index.ts. Your PR should include:
The server currently supports Gemini, OpenAI, Grok Imagine, and Veo 3. We'd love adapters for other image/video generation APIs — Stable Diffusion, Flux, etc. If you're interested in adding one, open an issue first so we can align on the interface.
pixel-surgeon-mcp is maintained by John Evans, part of the engineering team at Duval Software — a software engineering firm in Jacksonville Beach, FL building AI-powered tools and custom integrations. If you need MCP servers, AI pipelines, or production tooling built, get in touch.
MIT
Выполни в терминале:
claude mcp add pixel-surgeon-mcp -- npx Transcripts, channel stats, search
автор: YouTubeAI image generation using various models.
автор: modelcontextprotocolUnified GPU inference API with 30 AI services (LLM, image gen, video, TTS, whisper, embeddings, reranking, OCR) as MCP tools. Pay-per-use via x402 USDC or API k
автор: gpu-bridgeA powerful image generation tool using Google's Imagen 3.0 API through MCP. Generate high-quality images from text prompts with advanced photography, artistic,
автор: hamflxНе уверен что выбрать?
Найди свой стек за 60 секунд
Автор?
Embed-бейдж для README
Похожее
Все в категории media