Claude Screen

Name: Claude Screen
Availability: InStock
Author: lfzds4399-cpu

Бесплатно

Enables Claude to capture and analyze screen content across Windows, macOS, and Linux with zero native runtime dependencies.

автор: lfzds4399-cpu

GitHub

Описание

Enables Claude to capture and analyze screen content across Windows, macOS, and Linux with zero native runtime dependencies.

README

Let Claude see your screen. A cross-platform MCP server for Windows + macOS + Linux with OCR and smart vision-diff. Zero native runtime deps.

License: MIT Node MCP CI

Anthropic's official computer-use MCP for Claude Code is macOS-only today. This server fills the gap for Windows + Linux — and adds two things the official one doesn't have:

🔍 OCR so Claude can read screen text without spending vision tokens
📊 Smart vision-diff so 24/7 monitoring stays economical (skip frames that didn't change)

Quick start

# from source (until npm publish)
git clone https://github.com/lfzds4399-cpu/claude-screen-mcp
cd claude-screen-mcp
npm install
npm run build

# register with Claude Code
claude mcp add screen -- node "$(pwd)/dist/index.js"

# restart Claude Code, then ask:
# "Take a screenshot and tell me what's on my screen."
# "OCR my screen and tell me if there's an error message anywhere."
# "Watch my screen and ping me when the build finishes."

Tools (10 total)

Tool	Since	What it does
`screenshot`	v0.1	Capture full display, auto-resize for vision-token efficiency
`screenshot_region`	v0.1	Capture an `(x, y, w, h)` region — way cheaper than full
`list_displays`	v0.1	Enumerate connected monitors
`list_windows`	v0.1	List visible windows with optional title filter
`read_screen_text`	v0.2	OCR full screen or region (10-100× cheaper than vision)
`find_text_on_screen`	v0.2	Search OCR'd text, return matching lines + bboxes
`screenshot_if_changed`	v0.3	Capture only when perceptual hash distance ≥ threshold
`get_screen_diff`	v0.3	Distance-only diff — no image returned
`wait_for_change`	v0.4	Long-poll until the screen changes, then return one keyframe
`record_screen`	v0.4	Capture N seconds at low fps and return deduplicated keyframes

All 8 tools work the same way on Windows (PowerShell + System.Drawing), macOS (screencapture + osascript), and Linux (grim / scrot / import + wmctrl).

Use cases

1. Debug what you see — "Why is my React app not rendering? Look at the screen." → screenshot → Claude sees the error overlay → suggests fix.

2. Find something specific without burning vision tokens — "Is there an error message anywhere on my screen?" → find_text_on_screen("error") returns matching line + bbox → Claude calls screenshot_region on just that bbox.

3. Watch-while-task — "Ping me when this build finishes." → wait_for_change(timeoutMs=300000, threshold=12) — server blocks until the screen actually changes (or 5 min elapses), so the model only spends a turn when something happens. For longer watches, loop screenshot_if_changed(threshold=12) every 30s.

4. Show me what just happened — "I saw something flash by, replay the last 15 seconds." → record_screen(durationMs=15000, targetFps=2, maxFrames=6) returns up to 6 deduplicated keyframes covering that period in a single tool result — like rewinding a clip without storing video.

5. Read what's on screen, not look at it — "What does the current GitHub PR description say?" → read_screen_text returns plain text → 10-100× fewer tokens than vision.

Why this exists

Anthropic's official Claude Code computer-use MCP server (v2.1.85+) is macOS-only as of May 2026. Windows and Linux users have no first-party way to give Claude vision into their desktop.

This project fills the gap with three deliberate constraints:

Zero native runtime deps — uses each OS's built-in screenshot tooling (PowerShell + System.Drawing on Win, screencapture on Mac, grim/scrot/import on Linux). No node-gyp, no postinstall flakiness, no platform-specific binaries to bundle.
Single responsibility — only screen capture (read-only). Keyboard / mouse control belongs in a separate server (different threat model). This means it can be safely autostarted in any Claude session without granting input control.
Token-aware by design — auto-resize to maxEdge=1600, JPEG/WebP support, region capture, OCR (skip vision entirely for text), and perceptual-hash diff (skip frames that didn't change).

Quality bar

Every release was reviewed by 3 specialized agents (code quality + silent-failure-hunter + security auditor) before tagging. Across v0.1 → v0.3, the audits caught 16 P0 issues that were fixed before any tag was pushed:

v0.1: PowerShell -EncodedCommand BOM / Mac+Linux list_displays returning fake data / tool errors swallowing stderr / displayId argument injection / region OOM / output byte caps
v0.2: SCREEN_MCP_OCR_LANGS supply-chain injection (allowlist enforcement) / OCR worker timeout (was unbounded) / no-match token bomb / structured OCR diagnostics / SIGTERM handler
v0.3: cache size cap + LRU + 24h stale TTL / dHash channel assert (silent monitoring failure prevention) / cross-tool cache pollution fix / CompareResult.reason to distinguish first-call from real change
v0.4: Windows window-title mojibake (PowerShell OEM codepage → UTF-8) / Tesseract v6+ output schema (blocks: true required for line bboxes; without it find_text_on_screen silently returned 0 matches) / get_screen_diff misleading above_threshold reason / two new tools (wait_for_change, record_screen) for real-time-ish workflows

See the commit log for the full audit trail.

Configuration

Environment variables:

Var	Default	Purpose
`SCREEN_MCP_LOG_LEVEL`	`info`	`debug` / `info` / `warn` / `error`. Logs go to stderr.
`SCREEN_MCP_OCR_LANGS`	`eng+chi_sim`	Plus-separated tesseract codes. Allowlist enforced to prevent supply-chain attacks. Allowed: `eng`, `chi_sim`, `chi_tra`, `jpn`, `kor`, `fra`, `deu`, `spa`, `rus`, `ita`, `por`, `ara`, `nld`, `tur`, `vie`, `tha`, `hin`, `ben`, `ukr`.

First OCR call downloads ~40 MB of language models from cdn.jsdelivr.net. Subsequent calls reuse the cached worker.

Platform support

Platform	Capture	Region	Displays	Windows	OCR	Vision-diff
Windows ≥ 10	✅ tested	✅	✅ multi-display	✅	✅	✅
macOS ≥ 11	✅ code	✅	🟡 stub (single only)	✅	✅	✅
Linux (X11 + Wayland)	✅ code	✅	🟡 stub (single only)	🟡 needs `wmctrl`	✅	✅

Windows is the maintainer's primary platform and has end-to-end test coverage. macOS / Linux paths are written and CI-built but not yet end-to-end tested by the maintainer — PRs and issue reports very welcome.

Security & privacy

The server runs entirely locally. No screenshot data leaves your machine via this server. (Whatever LLM client connects controls where the image goes — that's the API call you authorized when registering the connector.)
OCR text is untrusted input. Anything visible on your screen — notifications, web pages, chat windows, ads — gets passed to the LLM as a tool result. A malicious actor controlling something on your screen could embed prompt-injection content. Tool descriptions and output delimiters (<screen_ocr>...</screen_ocr>) flag this clearly so downstream models can be guided to distrust.
Use screenshot_region when you don't need the whole screen.
Use read_screen_text instead of screenshot when you only need text — vastly fewer tokens and you're not exposing other windows that happen to be open.

Development

git clone https://github.com/lfzds4399-cpu/claude-screen-mcp
cd claude-screen-mcp
npm install
npm run build
node tests/e2e-wire.mjs    # spawn server + drive JSON-RPC + verify all 8 tools

Roadmap

v0.5 — screenshot_window(title) precisely scoped to a window's bounds; macOS multi-display enumeration via system_profiler; Linux multi-display via xrandr / wlr-randr; optional vendored tesseract models (SCREEN_MCP_OCR_LANG_PATH) for offline / air-gapped use
v1.0 — first-class MCPB bundle for one-click install via Claude Desktop

Why "real-time video" isn't a tool

MCP is request-response and each tool call costs an LLM turn (~1–3 s end-to-end). 24 fps streaming is physically impossible at that latency. Three substitutes cover the real use cases:

wait_for_change — like a human watching the screen and only saying something when it changes
record_screen — like rewinding a short clip with the boring frames cut out
screenshot_if_changed in a loop — for sustained polling under your own pacing

Contributing

PRs especially welcome for:

macOS multi-display enumeration (system_profiler SPDisplaysDataType -json parsing)
Linux per-output capture (grim -o, scrot --screen)
screenshot_window for v0.4
Performance regressions if you find any

See CONTRIBUTING.md (TODO).

Sibling projects

Other small, single-author harnesses I publish under @lfzds4399-cpu — same MIT, same opinionated taste:

Repo	One line
harness-engineering	The pattern (not a framework) underlying all of these — agents + validators + manifest, validated across 6+ projects
ai-council	Multi-voter consensus framework — disagreement blocks instead of being averaged away
domain-harness	Automated domain investing — discovery → AI Council valuation → registration → resale, with hard budget walls
methods-harness	SymPy-verified bilingual lesson pipeline for high-school calculus — one CLI re-renders everything
voice2ai	Hands-free dictation for Windows — push-to-talk into VS Code / Cursor / WeChat / browsers, 4 STT providers

If claude-screen-mcp is useful, ⭐ the repo — it's the cheapest signal and it actually moves the needle.

License

MIT — see LICENSE.

中文 TL;DR

让 Claude 看到你的屏幕。MCP server，跨 Win/Mac/Linux，零原生依赖。

填补 Anthropic 官方 computer-use MCP 仅 macOS 的空白，外加 OCR（省 vision token 10-100x）和智能 vision-diff（让 24/7 监测在 token 经济上可行）。

8 个 tool（截屏 / 区域 / 列显示器 / 列窗口 / OCR / 找文字 / 智能截屏 / 看变化），跨平台一致。每个 release 都过了 3 agent 联合审核（代码质量 + silent failure + security），共修了 16 个 P0 才发出去。

git clone https://github.com/lfzds4399-cpu/claude-screen-mcp
cd claude-screen-mcp && npm install && npm run build
claude mcp add screen -- node "$(pwd)/dist/index.js"
# 重启 Claude Code，然后说"截一张屏幕给我看"

中文 OCR 默认开启（eng+chi_sim），无需额外配置。

Как установить

Выполни в терминале:

claude mcp add claude-screen-mcp -- npx

Claude Screen

Описание

README

Quick start

Tools (10 total)

Use cases

Why this exists

Quality bar

Configuration

Platform support

Security & privacy

Development

Roadmap

Why "real-time video" isn't a tool

Contributing

Sibling projects

License

中文 TL;DR

Как установить

Похожие MCP

Compare Claude Screen with

Fetch

AWS KB Retrieval

Spring AI MCP Server

llm-analysis-assistant

Command Palette

Claude Screen

Описание

README

Quick start

Tools (10 total)

Use cases

Why this exists

Quality bar

Configuration

Platform support

Security & privacy

Development

Roadmap

Why "real-time video" isn't a tool

Contributing

Sibling projects

License

中文 TL;DR

Как установить

Похожие MCP

Compare Claude Screen with

Fetch

AWS KB Retrieval

Spring AI MCP Server

llm-analysis-assistant