loading…
Search for a command to run...
loading…
An MCP server that provides AI agents with a full Ubuntu desktop environment inside Docker, enabling them to perform complex computer tasks like browsing, codin
An MCP server that provides AI agents with a full Ubuntu desktop environment inside Docker, enabling them to perform complex computer tasks like browsing, coding, testing, and GUI automation.
Your AI writes code. This lets it use a computer.
Quick Start · Demo · Hosted Version · Contributing · Report Bug
Not just write code — but open a browser, click buttons, fill forms, run servers, test in real browsers, install anything, and see the screen?
taw-computer is an open-source MCP server that gives AI agents a full Ubuntu desktop inside Docker. Your AI connects, gets a real computer, and works like a human would.
No internal LLM. No chat UI. Your AI is the brain. This is the body.
📹 Demo coming soon — star this repo to get notified!
Other tools let AI write code. taw-computer lets AI use a computer.
| ChatGPT / Claude | Cursor / Copilot | Lovable / Bolt | taw-computer | |
|---|---|---|---|---|
| Write code | ✅ | ✅ | ✅ | ✅ |
| Run shell commands | ❌ | Limited | Sandboxed | Full Ubuntu |
| Browse the web | ❌ | ❌ | ❌ | Real Chromium |
| See & click the screen | ❌ | ❌ | ❌ | Desktop + VNC |
| Install any software | ❌ | ❌ | ❌ | apt/npm/pip |
| Test in real browser | ❌ | ❌ | Preview only | Playwright + CDP |
| Persist across sessions | ❌ | ❌ | ✅ | Snapshots |
| Self-hostable | ❌ | ❌ | ❌ | 100% yours |
Get running in under 5 minutes:
# 1. Clone & build
git clone https://github.com/the-agents-work/taw-computer.git
cd taw-computer
docker build -f images/Dockerfile.taw -t taw-computer-base .
# 2. Install & start
npm install && npm start
Then add to your AI client:
{
"mcpServers": {
"taw-computer": {
"command": "npx",
"args": ["tsx", "/path/to/taw-computer/mcp/index.ts"]
}
}
}
Add to Cursor MCP settings (Settings → MCP Servers) — same JSON format as above.
Add to claude_desktop_config.json — same JSON format as above.
taw-computer speaks standard MCP over stdio. Any client that supports MCP can connect.
Got a powerful server / Mac Mini / VPS? Run taw-computer there and connect from anywhere:
{
"mcpServers": {
"taw-computer": {
"command": "ssh",
"args": ["user@your-server", "cd /path/to/taw-computer && npx tsx mcp/index.ts"]
}
}
}
Your laptop (Claude Code)
↕ SSH (stdin/stdout piped over network)
Remote server (taw-computer + Docker)
↕ Docker
Ubuntu sandbox
Setup:
npm installsudo systemctl enable ssh) ssh-copy-id user@your-server (passwordless login)Watch via VNC: open http://your-server:6080 in your browser.
That's it. Now tell your AI: "Create a VM and build me a website" — and watch it work.
AI creates a VM → scaffolds Next.js → writes components → starts dev server → opens browser to check → iterates until it looks right
AI opens Chromium → navigates to Amazon → searches → scrolls → extracts prices → compares → reports back
AI launches Playwright → navigates to your URL → fills forms → clicks buttons → asserts results → reports failures
AI runs apt install postgresql → creates database → writes seed script → runs it → verifies with queries
AI takes desktop screenshot → resizes viewport → screenshots again → compares → suggests CSS fixes
┌─────────────────────────────────────────────────────┐
│ Your AI Client │
│ Claude Code · Cursor · Claude Desktop · any MCP │
└───────────────────────┬─────────────────────────────┘
│ MCP protocol (stdio)
┌───────────────────────▼─────────────────────────────┐
│ taw-computer MCP server 30+ tools │
│ vm · shell · files · browser · desktop · search │
└───────────────────────┬─────────────────────────────┘
│ Docker API
┌───────────────────────▼─────────────────────────────┐
│ Ubuntu 22.04 Sandbox isolated container│
│ │
│ bash Chromium + CDP xfce4 Desktop + VNC │
│ git npm pip curl Playwright │
│ python node xdotool scrot │
│ │
│ /workspace ← your project files live here │
└──────────────────────────────────────────────────────┘
| Tool | What it does |
|---|---|
vm_create |
Spin up a new sandbox. Returns VNC URL to watch live |
vm_destroy |
Destroy (auto-saves snapshot for later) |
vm_reset |
Destroy + delete snapshot (fresh start) |
vm_restart |
Restart container, keep all files |
vm_status |
CPU, RAM, disk, uptime, top processes |
vm_list |
List running sandboxes |
vm_rename |
Rename a VM |
snapshot_list |
List saved snapshots |
snapshot_delete |
Delete a snapshot |
| Tool | What it does |
|---|---|
exec |
Run any command: git, npm, pip, curl, docker, anything |
fs_read |
Read a file |
fs_write |
Write a file (creates parent dirs) |
fs_edit |
Find-and-replace in a file |
fs_list |
ls / recursive find |
fs_search |
grep for patterns |
code_search |
ripgrep with regex, file types, context |
file_upload |
Upload file into VM (base64, max 50MB) |
| Tool | What it does |
|---|---|
browser_navigate |
Go to URL, wait for load |
browser_snapshot |
Screenshot + numbered overlays on every clickable element |
browser_click_ref |
Click element #N from snapshot |
browser_type_ref |
Type into element #N |
browser_extract |
Read page text (CSS selector or full page) |
browser_eval |
Run JavaScript in page |
browser_wait_for |
Wait for selector / text / network idle |
browser_console_logs |
Read console.log, console.error, etc. |
browser_network_errors |
Catch 404s, CORS errors, failed requests |
browser_run_test |
Run a Playwright test script |
browser_open |
Open Chrome via desktop (fallback) |
browser_close |
Kill Chrome |
web_search |
Google search → top 8 results |
| Tool | What it does |
|---|---|
desktop_screenshot |
JPEG screenshot of the whole desktop |
desktop_click |
Click at (x, y) |
desktop_type |
Type text into focused window |
desktop_key |
Key combos: ctrl+c, alt+tab, Return, etc. |
desktop_scroll |
Scroll up/down |
desktop_drag |
Drag from A to B |
Most "computer use" tools guess pixel coordinates. We use Set-of-Mark prompting — the AI sees numbered badges on every interactive element:
Step 1: browser_snapshot
→ AI sees screenshot with [1] Login [2] Search [3] Cart ...
Step 2: browser_click_ref(ref=2)
→ clicks the Search box precisely
Step 3: browser_type_ref(ref=2, text="laptop", submit=true)
→ types and presses Enter
Step 4: browser_snapshot
→ sees new page with results [4] [5] [6] ...
No coordinate guessing. No CSS selector fragility. The AI sees what it's clicking.
Every sandbox comes with a noVNC web viewer. Open the URL in your browser and watch:
Perfect for demos, debugging, and building trust in AI agents.
| Included | |
|---|---|
| OS | Ubuntu 22.04 |
| Desktop | xfce4 + Xvfb + x11vnc + noVNC |
| Browser | Playwright Chromium (native arm64 + amd64) |
| Languages | Node.js 20, Python 3, build-essential |
| CLI | git, curl, wget, jq, ripgrep, tree, nano, vim |
| DB clients | PostgreSQL, MariaDB, Redis |
| Dev tools | GitHub CLI, yq, httpie |
| Automation | xdotool, scrot, imagemagick, xclip |
| Variable | Default | Description |
|---|---|---|
MAX_SANDBOXES |
3 |
Max concurrent VMs |
SANDBOX_TYPE |
auto |
auto / docker / firecracker |
DOCKER_IMAGE |
taw-computer-base |
Base image |
DOCKER_MEMORY_MB |
4096 |
RAM per container |
DOCKER_CPUS |
2 |
CPUs per container |
DESKTOP_RESOLUTION |
1280x720 |
Screen resolution |
| Minimum | |
|---|---|
| Docker | Docker Desktop or Docker Engine |
| Node.js | 20+ |
| RAM | ~4GB per sandbox |
| Disk | ~5GB for base image |
taw-computer/
├── mcp/
│ ├── index.ts # MCP server — stdio, 30+ tool handlers
│ └── browser.ts # Playwright CDP + Set-of-Mark engine
├── sandbox/
│ ├── SandboxManager.ts # Abstract interface
│ ├── DockerSandbox.ts # Docker implementation
│ ├── FirecrackerSandbox.ts # Firecracker microVM (optional)
│ ├── NetworkManager.ts # Network isolation
│ ├── config.ts # Env-based config
│ └── index.ts # Auto-detect backend
├── images/
│ └── Dockerfile.taw # Ubuntu sandbox image
├── .github/
│ ├── workflows/ci.yml # CI: typecheck + Docker build
│ └── ISSUE_TEMPLATE/ # Bug report + feature request
├── package.json
├── CONTRIBUTING.md
└── LICENSE (MIT)
We'd love your help! See CONTRIBUTING.md.
Ideas for first contributions:
Don't want to self-host? shipkit.cc — managed taw-computer with:
Those are closed-source, hosted-only products that generate code. taw-computer gives AI a real computer — it can run servers, browse the web, install anything, and interact with any desktop app. It's also open source and self-hostable.
OpenInterpreter runs code on your local machine (risky). Open Hands uses its own LLM orchestration. taw-computer is just the computer — no built-in LLM, no opinions about orchestration. Your existing AI client (Claude Code, Cursor, etc.) is the brain. taw-computer is a pure MCP server.
Each sandbox is an isolated Docker container with its own filesystem, network, and process space. Nothing inside can touch your host system. Containers have memory/CPU/PID limits. When you're done, destroy the VM.
Yes — any AI client that supports MCP can connect. The server doesn't care which LLM is behind the client.
Yes. Anywhere Docker runs, taw-computer runs. The sandbox image supports both arm64 (Apple Silicon) and amd64 (Intel/AMD).
If taw-computer is useful to you, give it a ⭐ — it helps others find it.
MIT — do whatever you want with it.
Выполни в терминале:
claude mcp add taw-computer -- npx Безопасность
Низкий рискАвтоматическая эвристика по публичным данным — не гарантия безопасности.