loading…
Search for a command to run...
loading…
Enables Claude to capture and analyze screen content across Windows, macOS, and Linux with zero native runtime dependencies.
Enables Claude to capture and analyze screen content across Windows, macOS, and Linux with zero native runtime dependencies.
Let Claude see your screen. A cross-platform MCP server for Windows + macOS + Linux with OCR and smart vision-diff. Zero native runtime deps.
Anthropic's official computer-use MCP for Claude Code is macOS-only today. This server fills the gap for Windows + Linux — and adds two things the official one doesn't have:
# from source (until npm publish)
git clone https://github.com/lfzds4399-cpu/claude-screen-mcp
cd claude-screen-mcp
npm install
npm run build
# register with Claude Code
claude mcp add screen -- node "$(pwd)/dist/index.js"
# restart Claude Code, then ask:
# "Take a screenshot and tell me what's on my screen."
# "OCR my screen and tell me if there's an error message anywhere."
# "Watch my screen and ping me when the build finishes."
| Tool | Since | What it does |
|---|---|---|
screenshot |
v0.1 | Capture full display, auto-resize for vision-token efficiency |
screenshot_region |
v0.1 | Capture an (x, y, w, h) region — way cheaper than full |
list_displays |
v0.1 | Enumerate connected monitors |
list_windows |
v0.1 | List visible windows with optional title filter |
read_screen_text |
v0.2 | OCR full screen or region (10-100× cheaper than vision) |
find_text_on_screen |
v0.2 | Search OCR'd text, return matching lines + bboxes |
screenshot_if_changed |
v0.3 | Capture only when perceptual hash distance ≥ threshold |
get_screen_diff |
v0.3 | Distance-only diff — no image returned |
wait_for_change |
v0.4 | Long-poll until the screen changes, then return one keyframe |
record_screen |
v0.4 | Capture N seconds at low fps and return deduplicated keyframes |
All 8 tools work the same way on Windows (PowerShell + System.Drawing), macOS (screencapture + osascript), and Linux (grim / scrot / import + wmctrl).
1. Debug what you see — "Why is my React app not rendering? Look at the screen."
→ screenshot → Claude sees the error overlay → suggests fix.
2. Find something specific without burning vision tokens — "Is there an error message anywhere on my screen?"
→ find_text_on_screen("error") returns matching line + bbox → Claude calls screenshot_region on just that bbox.
3. Watch-while-task — "Ping me when this build finishes."
→ wait_for_change(timeoutMs=300000, threshold=12) — server blocks until the screen actually changes (or 5 min elapses), so the model only spends a turn when something happens. For longer watches, loop screenshot_if_changed(threshold=12) every 30s.
4. Show me what just happened — "I saw something flash by, replay the last 15 seconds."
→ record_screen(durationMs=15000, targetFps=2, maxFrames=6) returns up to 6 deduplicated keyframes covering that period in a single tool result — like rewinding a clip without storing video.
5. Read what's on screen, not look at it — "What does the current GitHub PR description say?"
→ read_screen_text returns plain text → 10-100× fewer tokens than vision.
Anthropic's official Claude Code computer-use MCP server (v2.1.85+) is macOS-only as of May 2026. Windows and Linux users have no first-party way to give Claude vision into their desktop.
This project fills the gap with three deliberate constraints:
screencapture on Mac, grim/scrot/import on Linux). No node-gyp, no postinstall flakiness, no platform-specific binaries to bundle.maxEdge=1600, JPEG/WebP support, region capture, OCR (skip vision entirely for text), and perceptual-hash diff (skip frames that didn't change).Every release was reviewed by 3 specialized agents (code quality + silent-failure-hunter + security auditor) before tagging. Across v0.1 → v0.3, the audits caught 16 P0 issues that were fixed before any tag was pushed:
-EncodedCommand BOM / Mac+Linux list_displays returning fake data / tool errors swallowing stderr / displayId argument injection / region OOM / output byte capsSCREEN_MCP_OCR_LANGS supply-chain injection (allowlist enforcement) / OCR worker timeout (was unbounded) / no-match token bomb / structured OCR diagnostics / SIGTERM handlerCompareResult.reason to distinguish first-call from real changeblocks: true required for line bboxes; without it find_text_on_screen silently returned 0 matches) / get_screen_diff misleading above_threshold reason / two new tools (wait_for_change, record_screen) for real-time-ish workflowsSee the commit log for the full audit trail.
Environment variables:
| Var | Default | Purpose |
|---|---|---|
SCREEN_MCP_LOG_LEVEL |
info |
debug / info / warn / error. Logs go to stderr. |
SCREEN_MCP_OCR_LANGS |
eng+chi_sim |
Plus-separated tesseract codes. Allowlist enforced to prevent supply-chain attacks. Allowed: eng, chi_sim, chi_tra, jpn, kor, fra, deu, spa, rus, ita, por, ara, nld, tur, vie, tha, hin, ben, ukr. |
First OCR call downloads ~40 MB of language models from cdn.jsdelivr.net. Subsequent calls reuse the cached worker.
| Platform | Capture | Region | Displays | Windows | OCR | Vision-diff |
|---|---|---|---|---|---|---|
| Windows ≥ 10 | ✅ tested | ✅ | ✅ multi-display | ✅ | ✅ | ✅ |
| macOS ≥ 11 | ✅ code | ✅ | 🟡 stub (single only) | ✅ | ✅ | ✅ |
| Linux (X11 + Wayland) | ✅ code | ✅ | 🟡 stub (single only) | 🟡 needs wmctrl |
✅ | ✅ |
Windows is the maintainer's primary platform and has end-to-end test coverage. macOS / Linux paths are written and CI-built but not yet end-to-end tested by the maintainer — PRs and issue reports very welcome.
<screen_ocr>...</screen_ocr>) flag this clearly so downstream models can be guided to distrust.screenshot_region when you don't need the whole screen.read_screen_text instead of screenshot when you only need text — vastly fewer tokens and you're not exposing other windows that happen to be open.git clone https://github.com/lfzds4399-cpu/claude-screen-mcp
cd claude-screen-mcp
npm install
npm run build
node tests/e2e-wire.mjs # spawn server + drive JSON-RPC + verify all 8 tools
screenshot_window(title) precisely scoped to a window's bounds; macOS multi-display enumeration via system_profiler; Linux multi-display via xrandr / wlr-randr; optional vendored tesseract models (SCREEN_MCP_OCR_LANG_PATH) for offline / air-gapped useMCP is request-response and each tool call costs an LLM turn (~1–3 s end-to-end). 24 fps streaming is physically impossible at that latency. Three substitutes cover the real use cases:
wait_for_change — like a human watching the screen and only saying something when it changesrecord_screen — like rewinding a short clip with the boring frames cut outscreenshot_if_changed in a loop — for sustained polling under your own pacingPRs especially welcome for:
system_profiler SPDisplaysDataType -json parsing)grim -o, scrot --screen)screenshot_window for v0.4See CONTRIBUTING.md (TODO).
Other small, single-author harnesses I publish under @lfzds4399-cpu — same MIT, same opinionated taste:
| Repo | One line |
|---|---|
| harness-engineering | The pattern (not a framework) underlying all of these — agents + validators + manifest, validated across 6+ projects |
| ai-council | Multi-voter consensus framework — disagreement blocks instead of being averaged away |
| domain-harness | Automated domain investing — discovery → AI Council valuation → registration → resale, with hard budget walls |
| methods-harness | SymPy-verified bilingual lesson pipeline for high-school calculus — one CLI re-renders everything |
| voice2ai | Hands-free dictation for Windows — push-to-talk into VS Code / Cursor / WeChat / browsers, 4 STT providers |
If claude-screen-mcp is useful, ⭐ the repo — it's the cheapest signal and it actually moves the needle.
MIT — see LICENSE.
让 Claude 看到你的屏幕。MCP server,跨 Win/Mac/Linux,零原生依赖。
填补 Anthropic 官方 computer-use MCP 仅 macOS 的空白,外加 OCR(省 vision token 10-100x)和智能 vision-diff(让 24/7 监测在 token 经济上可行)。
8 个 tool(截屏 / 区域 / 列显示器 / 列窗口 / OCR / 找文字 / 智能截屏 / 看变化),跨平台一致。每个 release 都过了 3 agent 联合审核(代码质量 + silent failure + security),共修了 16 个 P0 才发出去。
git clone https://github.com/lfzds4399-cpu/claude-screen-mcp
cd claude-screen-mcp && npm install && npm run build
claude mcp add screen -- node "$(pwd)/dist/index.js"
# 重启 Claude Code,然后说"截一张屏幕给我看"
中文 OCR 默认开启(eng+chi_sim),无需额外配置。
Выполни в терминале:
claude mcp add claude-screen-mcp -- npx