loading…
Search for a command to run...
loading…
Enables text-only AI models to understand clipboard images by describing them through a vision model, eliminating manual file saving.
Enables text-only AI models to understand clipboard images by describing them through a vision model, eliminating manual file saving.
Add vision to text-only models in Opencode (DeepSeek V4, GLM 5.1) — see the image in your clipboard directly, no manual file saving.
Tested on Windows 11 + Opencode + DeepSeek V4 Pro. Multi-OS clipboard support (Windows / macOS / Linux X11 / Linux Wayland).
Forked from itcomgroup/vision-mcp-server — rewritten around clipboard-first tools, security hardening, cross-platform clipboard extraction, and one-prompt AI install.
Cheap/fast text-only models like DeepSeek V4 and GLM 5.1 are great for code, but they cannot read images. Every time you paste a screenshot, the model asks you to save it to disk and provide a path.
This MCP server exposes *_from_clipboard tools. When the LLM needs to see your screenshot, it calls analyze_clipboard — the server reads the clipboard image, sends it to a real vision model (Groq + Llama-4 Scout, free tier), and returns a text description the text model can reason about.
Result: paste → ask → done. No file shuffling.
Instead of running the steps below manually, paste one of these prompts into any coding assistant (DeepSeek, GLM, Claude, GPT, ...) and it will set everything up for you end-to-end — clone, venv, deps, MCP config, keybindings:
Prefer doing it yourself? Keep reading.
analyze_clipboard, extract_text_from_clipboard, diagnose_error_from_clipboard, describe_ui_from_clipboard, code_from_clipboard.pip install -e .)| Package | Purpose |
|---|---|
mcp>=1.0.0 |
MCP protocol server |
groq>=0.11.0 |
Groq API client (Llama-4 Scout vision) |
aiofiles>=23.0.0 |
Async file I/O |
Pillow>=10.0.0 |
Clipboard image extraction (Windows/macOS), PNG encoding |
| OS | Command | Why |
|---|---|---|
| Windows | nothing extra | Pillow + pywin32 handle the clipboard natively. |
| macOS | brew install pngpaste (optional fallback) |
Pillow works in most cases; pngpaste as backup. |
| Linux — Wayland | sudo apt install wl-clipboard |
Provides wl-paste. |
| Linux — X11 | sudo apt install xclip |
Or your distro equivalent. |
git clone https://github.com/Capetlevrai/clipboard-vision-mcp.git
cd clipboard-vision-mcp
python -m venv .venv
# Windows:
.venv\Scripts\activate
# macOS/Linux:
source .venv/bin/activate
pip install -e .
Copy any screenshot, then:
python examples/smoke_test.py
Expected: OK: clipboard image saved to <path>.
See docs/OPENCODE.md for Opencode (Windows-tested) or docs/CLIENTS.md for Claude Code / Cursor / Cline / Continue.
Opencode (%APPDATA%\opencode\opencode.json on Windows, ~/.config/opencode/opencode.json on Linux/macOS):
{
"mcp": {
"clipboard-vision": {
"type": "local",
"command": [
"C:\\path\\to\\clipboard-vision-mcp\\.venv\\Scripts\\python.exe",
"-m",
"clipboard_vision_mcp"
],
"enabled": true,
"environment": {
"GROQ_API_KEY": "gsk_your_key_here"
}
}
}
}
💡 Use the absolute path to the venv's Python. This guarantees the MCP starts with the right dependencies regardless of shell, cwd, or active venv.
Opencode does not bind image-paste to Ctrl+V / Alt+V by default. Without this step, copying a screenshot and hitting paste will insert nothing (or plain text).
Edit your Opencode keybinds.json or the keybinds section of opencode.json:
{
"keybinds": {
"input_paste": "ctrl+v",
"input_paste_image": "alt+v"
}
}
Restart Opencode after editing.
Yes. Opencode re-reads opencode.json at every launch and auto-spawns any MCP server with "type": "local" and "enabled": true. Because the command uses the absolute path to the venv's Python, it doesn't matter which shell or working directory Opencode is launched from.
Reboot → open Opencode → clipboard-vision tools are listed. No manual step.
You: (copy a screenshot to clipboard, then type)
"Look at what I just copied and tell me what's wrong with this error."
LLM (DeepSeek, GLM, Claude, ...): [calls diagnose_error_from_clipboard]
→ "The error says `ECONNREFUSED 127.0.0.1:5432`. Postgres isn't
running on port 5432. Start it with: ..."
The text-only model never sees pixels — it reads the description returned by Llama-4 Scout and reasons over it.
| Tool | Input | Use when |
|---|---|---|
analyze_clipboard |
optional prompt |
Generic description, Q&A on the clipboard image. |
extract_text_from_clipboard |
— | Pure OCR. |
describe_ui_from_clipboard |
— | UI/UX review, component inventory. |
diagnose_error_from_clipboard |
— | Error screenshot → cause + fix. |
code_from_clipboard |
— | Extract code from a screenshot. |
analyze_image |
image_path, optional prompt |
Image already on disk. |
extract_text, describe_ui, diagnose_error, understand_diagram, analyze_chart, code_from_screenshot |
image_path |
Same as above for files. |
This server runs as a local stdio process — it does not open any network port and only talks to the MCP client over stdin/stdout and to the Groq API over HTTPS.
Hardening in place:
analyze_image and the other file-path tools only accept .png .jpg .jpeg .gif .webp .bmp. This prevents a prompt-injected LLM from asking the server to read arbitrary local files (~/.ssh/id_rsa, .env, ...) and exfiltrate them as base64 to Groq.$TMPDIR/clipboard_vision_mcp/ and unlinks them on completion.Please open a private security advisory rather than a public issue.
┌──────────────┐ MCP ┌─────────────────┐ HTTPS ┌─────────────────┐
│ Opencode │ ──────▶ │ clipboard- │ ────────▶ │ Groq API │
│ (DeepSeek) │ │ vision-mcp │ │ Llama-4 Scout │
└──────────────┘ └─────────────────┘ └─────────────────┘
│
▼
reads system clipboard
(PIL / wl-paste / xclip)
→ validate → base64 → send → delete
wl-paste --type image/png or xclip -selection clipboard -t image/png -o | file - works outside the MCP.environment block in your client config, then fully restart the client.python -m clipboard_vision_mcp manually — it should start and wait silently on stdin.MIT — see LICENSE.
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"clipboard-vision-mcp": {
"command": "npx",
"args": []
}
}
}Web content fetching and conversion for efficient LLM usage.
Retrieval from AWS Knowledge Base using Bedrock Agent Runtime.
Provides auto-configuration for setting up an MCP server in Spring Boot applications.
A very streamlined mcp client that supports calling and monitoring stdio/sse/streamableHttp, and can also view request responses through the /logs page. It also