loading…
Search for a command to run...
loading…
Enables AI coding agents to locate UI elements on screen with exact pixel coordinates using a local UI-TARS-2B vision model. Provides offline GUI grounding with
Enables AI coding agents to locate UI elements on screen with exact pixel coordinates using a local UI-TARS-2B vision model. Provides offline GUI grounding with ~1.2s latency on 4.1GB VRAM, eliminating cloud dependencies for computer-use automation.
MCP server that gives AI coding agents local GUI grounding — the ability to find any UI element on screen and return its exact pixel coordinates.
Powered by UI-TARS-2B, ByteDance's 2B parameter GUI grounding model.
Claude Code's built-in computer-use sends every screenshot to the cloud for analysis. This MCP server runs a local vision model instead:
# Requires ~4.5GB disk space
huggingface-cli download bytedance-research/UI-TARS-2B-SFT --local-dir ./ui-tars-2b
# Install CUDA-enabled PyTorch first (adjust cu126 to your CUDA version)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126
pip install uitars-mcp
# or from source:
pip install -e .
Add to your Claude Code MCP settings (~/.claude/settings.json):
{
"mcpServers": {
"uitars-mcp": {
"command": "uitars-mcp",
"env": {
"UITARS_MODEL": "/path/to/ui-tars-2b"
}
}
}
}
If installed in a venv, use the full path to the executable:
{
"mcpServers": {
"uitars-mcp": {
"command": "/path/to/venv/bin/uitars-mcp",
"env": {
"UITARS_MODEL": "/path/to/ui-tars-2b"
}
}
}
}
| Tool | What it does | Latency |
|---|---|---|
find_element |
Find a UI element by description, returns click coordinates | ~1.2s |
describe_screen |
Describe everything visible on screen | ~2s |
read_screen_text |
OCR — read all text on screen | ~3s |
check_element |
Check element state (enabled, value, etc.) | ~1s |
verify_action |
Verify an action worked by checking screen state | ~1.5s |
suggest_action |
Suggest next action to achieve a goal | ~1.5s |
benchmark |
Measure end-to-end latency | varies |
mss (fast, cross-platform)computer-use click toolsThe model is lazy-loaded on first call (~3s), then stays in VRAM for subsequent calls.
| Variable | Default | Description |
|---|---|---|
UITARS_MODEL |
(required) | Path to UI-TARS-2B model directory |
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"uitars-mcp": {
"command": "npx",
"args": []
}
}
}