loading…
Search for a command to run...
loading…
Enables LLM agents to capture screenshots and visualize runtime behavior within the Godot game engine for game development benchmarking. It utilizes AppleScript
Enables LLM agents to capture screenshots and visualize runtime behavior within the Godot game engine for game development benchmarking. It utilizes AppleScript to provide display capture functionality for agents running on macOS.
A benchmark suite for evaluating LLM agents on game development tasks.
Paper: GameDevBench: A Comprehensive Benchmark for Game Development
GameDevBench contains 132 game development tasks to evaluate LLM agents' ability to complete game development problems in the Godot game engine.
Godot 4.x - Download and install from godotengine.org
godot is available in your PATH, or set GODOT_EXEC_PATH environment variablePython 3.10+ - Required for all agents
Install the agent(s) you want to use:
Before running the benchmark, unzip the tasks folder:
unzip tasks.zip
Note: The tasks are distributed as a zip file to prevent accidental data leakage.
You can use the built-in plans for claude-code, codex, and gemini-cli, or provide API keys directly. For OpenHands you must provide your own API keys. See .env.example for a complete list of optional environment variables.
uv run python gamedevbench/src/benchmark_runner.py \
--agent AGENT \
--model MODEL \
run --task-list tasks.yaml
claude-code - Anthropic's Claude Code CLIcodex - OpenAI Codexgemini-cli - Google Gemini CLIopenhands - OpenHands (requires Python 3.12+)--agent AGENT - Agent to use (required)--model MODEL - Model name (e.g., claude-sonnet-4.5-20250929)--enable-mcp - Enable MCP (Model Context Protocol) server for supported agents--use-runtime-video - Enable runtime video mode--skip-display - Skip tasks that require displayrun --task-list FILE - Run tasks from YAML file (e.g., tasks.yaml)macOS-only Features:
--enable-mcp) currently only works on macOSGODOT_SCREENSHOT_DISPLAY environment variable to correct display numberBenchmark results are saved to results/ directory with the following information:
@misc{chi2026gamedevbenchevaluatingagenticcapabilities,
title={GameDevBench: Evaluating Agentic Capabilities Through Game Development},
author={Wayne Chi and Yixiong Fang and Arnav Yayavaram and Siddharth Yayavaram and Seth Karten and Qiuhong Anna Wei and Runkun Chen and Alexander Wang and Valerie Chen and Ameet Talwalkar and Chris Donahue},
year={2026},
eprint={2602.11103},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2602.11103},
}
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"gamedevbench-mcp": {
"command": "npx",
"args": []
}
}
}