loading…
Search for a command to run...
loading…
A physics engine for liquid-cooled GPU systems, exposed as an AI-callable MCP server. Enables thermal analysis, coolant comparison, flow optimization, and rack-
A physics engine for liquid-cooled GPU systems, exposed as an AI-callable MCP server. Enables thermal analysis, coolant comparison, flow optimization, and rack-level sizing via natural language queries.
CI PyPI Python 3.10+ Open In Colab
A physics engine for liquid-cooled GPU systems, exposed as an AI-callable MCP server. Ask Claude to size a cooling system for an H100 cluster, optimize cold plate flow rates, or compare water versus glycol — and get first-principles answers backed by hand-validated thermal models.
Try it now — open the interactive notebook in Colab to run NVL72 rack sizing, topology comparisons, and flow optimization interactively.
Install and use as an MCP server:
pip install thermal-mcp-server
Add to your MCP client config (claude_desktop_config.json for Claude Desktop):
{
"mcpServers": {
"thermal": {
"command": "python",
"args": ["-m", "thermal_mcp_server"]
}
}
}
Note: Claude Desktop does not inherit your shell's
PATH. If the above doesn't work, use the absolute path to your Python binary (e.g./usr/local/bin/pythonor the path inside a virtualenv).
Once configured, ask Claude engineering questions directly:
"I have 8 H100 SXM GPUs at 700 W each, water cooling at 8 LPM per cold plate, 25°C supply. What's the junction temperature and thermal margin?"
"Compare water versus 50/50 glycol for a 700 W load at 8 LPM."
"Size a CDU for 8 H100 GPUs in a parallel manifold — total flow, system ΔP, and return water temperature."
Claude calls the relevant tool, interprets the physics, and answers in context.
Claude Desktop calling analyze_coldplate via the MCP server. The user asks a natural-language thermal question; Claude picks the right tool, runs the physics, and interprets the result.
This is the hand-calculation validated reference case — every intermediate value (Reynolds number, Nusselt number, convection coefficient, pressure drop) is independently verified in tests/test_physics_behavior.py.
from thermal_mcp_server.physics import analyze
from thermal_mcp_server.schemas import AnalyzeColdplateInput
result = analyze(AnalyzeColdplateInput(
heat_load_w=700, flow_rate_lpm=8.0, inlet_temp_c=25.0, coolant="water"
))
print(f"Junction temp: {result.junction_temp_c:.1f}°C") # 70.9°C
print(f"Thermal margin: {83 - result.junction_temp_c:.1f}°C below throttle onset")
print(f"Flow regime: {result.regime}") # transitional (Re ≈ 3734)
print(f"Pressure drop: {result.pressure_drop_pa:.0f} Pa") # 16800 Pa (0.17 bar)
For rack-scale analysis (NVL72 CDU sizing, series vs. parallel topology, B200 at 1,200 W), see the interactive notebook.
Four MCP tools, each also available as a Python function:
| Tool | What it does |
|---|---|
analyze_coldplate |
Single-point thermal + hydraulic analysis: Tj, resistance breakdown, ΔP, regime, pump power |
compare_coolants |
Side-by-side water vs. glycol at identical conditions |
optimize_flow_rate |
Binary search for minimum flow to meet a Tj target |
analyze_rack |
N identical GPUs in series or parallel: max Tj, per-GPU temps, total flow, system ΔP, CDU return temp |
See docs/mcp.md for full input/output schemas.
The physics engine models a cold plate as a 1D thermal resistance network:
T_junction = T_inlet + Q × (R_jc + R_tim + R_base + R_conv) + ΔT_coolant/2
Rack-level model stacks N single-GPU analyses in series (cumulative temperature rise) or parallel (uniform inlet, flow split) topology.
flowchart LR
A["Input\nchip power, flow,\ncoolant, geometry"] --> B["Physics Engine\nDittus-Boelter · Darcy-Weisbach\nR_total network"]
B --> C["Output\nT_junction · ΔP\nthermal margin · pump power"]
See docs/physics.md for the full physics documentation including equations and assumptions.
Model outputs against published chip specs. All runs use water coolant, 25°C inlet.
| Chip | TDP | Tj Design Ceiling | Model Tj | Margin | Notes |
|---|---|---|---|---|---|
| H100 SXM | 700 W | 83°C | 70.9°C at 8 LPM | 12.1°C | Default geometry; hand-calc validated |
| MI300X | 750 W | ~85°C (proxy) | 74.2°C at 8 LPM | ~10°C | AMD does not publish Tj_max |
| B200 NVL72 | 1,200 W | ~75°C (est.) | 75.0°C at 9.3 LPM/GPU | 0°C at limit | R_jc=0.02 K/W est.; NVIDIA does not publish |
| Gaudi 3 OAM | 900 W (air) / 1,200 W (liquid) | ~85°C (proxy) | Requires B200-class geometry | — | Default H100 geometry undersized for 1,200 W |
On B200 and Gaudi 3 numbers: NVIDIA and Intel do not publish cold plate geometry or R_jc for these chips. The B200 analysis uses engineering estimates. Treat as indicative; real sizing requires vendor data.
Chip sources: NVIDIA H100 Datasheet · NVIDIA GB200 NVL72 · SemiAnalysis B200 thermal estimates · AMD MI300X Data Sheet · Intel Gaudi 3 Product Brief
These are documented explicitly because they bound what the model can and cannot tell you:
git clone https://github.com/riccardovietri/thermal-mcp-server.git
cd thermal-mcp-server
uv sync --group dev
uv run pytest -v # all tests should pass
Выполни в терминале:
claude mcp add thermal-mcp-server -- npx Безопасность
Низкий рискАвтоматическая эвристика по публичным данным — не гарантия безопасности.