loading…
Search for a command to run...
loading…
GUI automation MCP server that enables AI agents to see and control the Windows desktop using a local Vision LLM (Ollama), supporting screenshot analysis, mouse
GUI automation MCP server that enables AI agents to see and control the Windows desktop using a local Vision LLM (Ollama), supporting screenshot analysis, mouse/keyboard actions, and autonomous task execution.
GUI automation MCP server powered by local Vision LLM (Ollama)
helix-pilot lets AI agents see and control your Windows desktop through the Model Context Protocol (MCP). It captures screenshots, analyzes them with a local Ollama Vision model, and executes mouse/keyboard actions — all running on your machine with zero cloud API cost.
Most GUI automation tools either require expensive cloud APIs, only support macOS, or run inside VMs. helix-pilot is different:
| Feature | helix-pilot | terminator | UI-TARS Desktop | Peekaboo | Cua |
|---|---|---|---|---|---|
| MCP server (CLI-native) | Yes | No | Partial | Yes | No |
| Windows host direct control | Yes | Yes | Yes | No (macOS) | No (VM) |
| Local Vision LLM (Ollama) | Yes | No | No | Yes | No |
| Zero cloud API cost | Yes | No | No | Yes | No |
| Open WebUI integration | Yes | No | No | No | No |
| Built-in safety system | Yes | Partial | No | No | Partial |
| Open source (MIT) | Yes | Yes | Yes | Yes | Yes |

AI agent calls helix-pilot tools via MCP: status() → screenshot() → describe() → auto(). The Vision LLM analyzes the screen and executes GUI actions autonomously.

helix-pilot captures the screen and sends it to a local Ollama Vision model for analysis. The model identifies windows, UI elements, and layout — all running locally with zero API cost.
status() output{
"ok": true,
"helix_pilot_version": "2.0.0",
"ollama": { "available": true, "endpoint": "http://localhost:11434" },
"screen_size": [3840, 2160],
"agent_runtime": { "tracked_agents": 1, "running_agents": 0 },
"safe_mode": true,
"visible_windows": ["Claude Code", "Google Chrome", "Windows PowerShell", "..."]
}
ollama pull mistral-small3.2
Other supported models:
gemma3:27b,llava,moondream, or any Ollama vision model.
git clone https://github.com/tsunamayo7/helix-pilot.git
cd helix-pilot
uv sync
Edit config/helix_pilot.json:
{
"ollama_endpoint": "http://localhost:11434",
"vision_model": "mistral-small3.2:latest"
}
See Compatible MCP Clients below for setup instructions.
helix-pilot works with any MCP-compatible client. Here are tested configurations:
Add to your Claude Code MCP settings (.claude.json or project settings):
{
"mcpServers": {
"helix-pilot": {
"command": "uv",
"args": ["--directory", "/path/to/helix-pilot", "run", "server.py"]
}
}
}
Add to your Codex CLI MCP configuration:
{
"mcpServers": {
"helix-pilot": {
"command": "uv",
"args": ["--directory", "/path/to/helix-pilot", "run", "server.py"]
}
}
}
Add to your editor's MCP settings:
{
"mcpServers": {
"helix-pilot": {
"command": "uv",
"args": ["--directory", "/path/to/helix-pilot", "run", "server.py"]
}
}
}
helix-pilot works with Open WebUI and local Ollama models through MCPO (MCP-to-OpenAPI proxy).
pip install mcpo
mcpo_config.json:{
"mcpServers": {
"helix-pilot": {
"command": "uv",
"args": ["--directory", "/path/to/helix-pilot", "run", "server.py"]
}
}
}
mcpo --host 127.0.0.1 --port 8300 --config mcpo_config.json
OpenAPIhttp://127.0.0.1:8300/helix-pilotAll 20 tools are now available to any Ollama model with function calling support (e.g. gemma3:27b, qwen3.5:122b).
helix-pilot provides 20 MCP tools for comprehensive GUI automation:
| Tool | Description |
|---|---|
screenshot |
Capture screen or window screenshot |
click |
Click at screen coordinates |
type_text |
Type text (Unicode supported) |
hotkey |
Send keyboard shortcut (e.g. ctrl+c) |
scroll |
Scroll mouse wheel |
describe |
Describe screen content via Vision LLM |
find |
Find UI element by description, returns coordinates |
verify |
Verify screen matches expected state |
status |
Check system status (Ollama, models, screen) |
list_windows |
List all visible windows |
wait_stable |
Wait until screen stops changing |
auto |
Autonomous multi-step GUI task execution |
browse |
Browser-specialized automation |
click_screenshot |
Click then immediately screenshot |
resize_image |
Resize image for AI model size limits |
spawn_pilot_agent |
Launch a background GUI worker with default / explorer / worker roles |
send_pilot_agent_input |
Continue the same GUI worker with a follow-up instruction |
wait_pilot_agent |
Wait for the current agent turn and fetch the last result |
list_pilot_agents |
Inspect tracked background GUI agents |
close_pilot_agent |
Close an idle GUI agent |
The new lifecycle tools let Claude Code treat helix-pilot as a persistent GUI worker instead of only as one-shot tool calls.
spawn_pilot_agent to start a background agent in auto or browse mode.default for general execution, explorer for observation-first dry_run planning, worker for direct execution.send_pilot_agent_input to continue the same worker with accumulated GUI context.wait_pilot_agent, list_pilot_agents, and close_pilot_agent to coordinate long-running desktop tasks.helix-pilot includes multiple safety layers to protect your system:
observe_only, draft_only, apply_with_approvalClaude Code / Codex CLI / Cursor Open WebUI + Ollama
| |
| MCP (stdio) | HTTP (via MCPO)
v v
server.py (FastMCP) <-------------> MCPO proxy (optional)
|
v
HelixPilot (src/pilot.py)
|
+-- CoreOperations (PyAutoGUI + Win32 API)
+-- VisionLLM (Ollama API via httpx)
+-- SafetyGuard (policies + user monitoring)
+-- ActionContract (policy evaluation)
# Run tests
uv run python -m pytest tests/ -v
# Lint
uv run ruff check .
# Syntax check
uv run python -m py_compile server.py
# Run server directly
uv run python server.py
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first.
git checkout -b feature/amazing-feature)git commit -m 'Add amazing feature')git push origin feature/amazing-feature)MIT - feel free to use this in your own projects.
If you find helix-pilot useful, please consider giving it a star!
helix-pilot は、ローカルの Vision LLM (Ollama) を使って Windows デスクトップを AI エージェントが操作できる MCP サーバーです。
特徴:
クイックスタート:
ollama pull mistral-small3.2
git clone https://github.com/tsunamayo7/helix-pilot.git
cd helix-pilot && uv sync
MCP クライアント(Claude Code 等)に接続するだけで、20 個の GUI 自動化ツールが利用可能になります。 詳細なセットアップ方法は上記の英語ドキュメントをご覧ください。
Выполни в терминале:
claude mcp add helix-pilot -- npx Read and write pages in your workspace
автор: NotionIssues, cycles, triage — from Claude
автор: LinearSearch and read your Drive files
автор: GoogleConnect and unify data across various platforms and databases with [MindsDB as a single MCP server](https://docs.mindsdb.com/mcp/overview).
автор: mindsdbНе уверен что выбрать?
Найди свой стек за 60 секунд
Автор?
Embed-бейдж для README
Похожее
Все в категории productivity