loading…
Search for a command to run...
loading…
An MCP server that converts Windows WPR .etl performance traces into structured JSON summaries and flamegraph-ready data for LLM analysis. It bridges Windows Pe
An MCP server that converts Windows WPR .etl performance traces into structured JSON summaries and flamegraph-ready data for LLM analysis. It bridges Windows Performance Analyzer automation with LLM reasoning capabilities for performance troubleshooting.
An MCP (Model Context Protocol) server that turns Windows WPR
.etltraces into structured, LLM-friendly performance insights — using WPAExporter + xperf under the hood, and optionally emitting flamegraph-ready folded stacks.
wpa-mcp bridges two worlds:
.etl files.This server exposes a small set of MCP tools so an LLM can:
+------------------+ stdio (MCP) +--------------------+
| LLM / MCP host | <--------------------> | wpa-mcp server |
| (Claude, VSCode) | | (this repo) |
+------------------+ +----------+---------+
|
subprocess |
v
+---------------------+---------------------+
| xperf.exe | wpaexporter.exe |
| (validate / stats) | (+ .wpaProfile) |
+---------------------+---------------------+
|
v
CSV tables (per profile)
|
v
summarizer -> JSON / flamegraph -> .folded
Everything that the LLM sees is structured JSON or compact folded-stack text — never raw gigabyte CSVs.
wpaexporter.exexperf.exeIf WPT is installed to a non-default path, set:
setx WPAEXPORTER_PATH \"C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\wpaexporter.exe\"
setx XPERF_PATH \"C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\xperf.exe\"
pipx install wpa-mcp
wpa-mcp # starts the MCP stdio server
git clone https://github.com/Jialong-zhong/wpr-xperf-mcp-server.git
cd wpr-xperf-mcp-server
pip install -e .
wpa-mcp
The server's analyses are only as good as the providers you captured. Recommended capture for the four problem classes this server targets:
# Run as Administrator
wpr -start CPU ^
-start GeneralProfile ^
-start DesktopComposition ^
-start Registry ^
-filemode
# ... reproduce the issue ...
wpr -stop C:\traces\case01.etl \"repro notes here\"
| WPR profile | What it adds that wpa-mcp uses |
|---|---|
CPU |
Sampled CPU, CSwitch, ReadyThread, StackWalk |
GeneralProfile |
Processes, images, DPC/ISR |
DesktopComposition |
DWM frame timing, Window-in-focus (UI hang evidence) |
Registry |
Registry activity (optional; useful for startup/UI hangs) |
If you skip
CPU, the most valuable analyses (hot stacks, scheduling latency) won't work —validate_tracewill tell you so.
| Tool | Purpose | Typical caller |
|---|---|---|
validate_trace(etl_path) |
Run xperf -a stats and report which providers / stacks exist |
LLM, always first |
export_tables(etl_path, profile) |
Run one WPA profile via wpaexporter and return CSV paths |
Advanced / targeted |
analyze_etl(etl_path, focus) |
Validate → export (by focus) → summarize. Returns one structured JSON | LLM, default entry point |
analyze_audio_glitch(etl_path) |
MXA-style audio glitch analysis: long DPC/ISR buckets, dispatch-level suspects, and pipeline risk | LLM, audio crackle/skip triage |
render_flamegraph(out_dir) |
Aggregate CPU Usage (Sampled) stacks into Brendan-Gregg folded format |
After analyze_etl with CPU focus |
analyze_etl input schema{
\"etl_path\": \"C:\\traces\\case01.etl\",
\"focus\": \"cpu | latency | ui | dpc_isr | all\",
\"out_dir\": \"optional override\",
\"top_n\": 20
}
analyze_etl output shape (abbreviated){
\"etl\": \"C:\\traces\\case01.etl\",
\"focus\": \"all\",
\"validation\": {
\"duration_sec\": 42.7,
\"has_cpu_sampling\": true,
\"has_cswitch\": true,
\"has_readythread\": true,
\"has_stacks\": true,
\"has_dpc_isr\": true,
\"has_dwm\": true,
\"warnings\": []
},
\"exports\": [\"...\\cpu\\CPU Usage (Sampled)_...csv\", \"...\"],
\"summary\": {
\"cpu_top_processes\": [{\"process\": \"chrome.exe\", \"weight_ms\": 8421.3}],
\"cpu_top_modules\": [{\"module\": \"ntdll.dll\", \"weight_ms\": 2310.0}],
\"cpu_hot_stacks\": [{\"stack\": \"ntdll!... ; app!hot_fn\", \"weight_ms\": 1240.0}],
\"ready_latency_top\": [{\"process\": \"explorer.exe\", \"tid\": 1234, \"p95_ms\": 187.0}],
\"dpc_isr_top\": [{\"driver\": \"ndis.sys\", \"total_ms\": 95.2, \"max_us\": 820}],
\"ui_focus_top\": [{\"process\": \"myapp.exe\", \"focus_ms\": 5400.0}],
\"dwm_slow_frames\": {\"count\": 38, \"p95_ms\": 41.7, \"max_ms\": 128.0}
}
}
analyze_audio_glitch input schema{
\"etl_path\": \"C:\\traces\\AudioGlitches_ThreadsAtDispatchLevel.etl\",
\"top_n\": 20,
\"dispatch_threshold_us\": 1000,
\"exclude_idle\": true
}
analyze_audio_glitch output shape (abbreviated){
\"etl\": \"C:\\traces\\AudioGlitches_ThreadsAtDispatchLevel.etl\",
\"validation\": { \"duration_sec\": 69.5, \"has_dpc_isr\": true },
\"analysis\": {
\"mxa_criteria\": { \"long_dpc_or_isr_threshold_us\": 1000 },
\"audio_glitch_assessment\": {
\"risk\": \"medium\",
\"long_running_dpc\": { \"total_count\": 43808, \"over_threshold_count\": 0 },
\"long_running_isr\": { \"total_count\": 23838, \"over_threshold_count\": 0 },
\"audio_related_drivers_in_dpc_top\": [{ \"driver\": \"portcls.sys\", \"total_ms\": 116.7 }],
\"audio_pipeline_processes\": [{ \"process\": \"audiodg.exe\", \"weight_ms\": 244.7 }]
},
\"base_summary\": { \"dpc_isr_top\": [], \"cpu_business_processes\": [] }
}
}
Each profile is a .wpaProfile XML that tells wpaexporter which WPA tables + columns to dump.
| Focus key | File | Tables exported |
|---|---|---|
cpu |
wpa/profiles/cpu_hotpath.wpaProfile |
CPU Usage (Sampled) |
latency |
wpa/profiles/scheduling_latency.wpaProfile |
CPU Usage (Precise), Ready Thread |
ui |
wpa/profiles/ui_hang.wpaProfile |
Window In Focus, DWM Frame Details |
dpc_isr |
wpa/profiles/dpc_isr.wpaProfile |
DPC/ISR Duration |
Column sets are deliberately minimal to keep CSVs small and summarizer-friendly.
These are end-to-end, copy-pasteable walkthroughs. Each shows the user prompt, the tool calls the LLM should make, the JSON shape you can expect, and the conclusions a well-prompted LLM should draw.
User: "C:\traces\cpu_spike.etl — some process is pinning my CPU at 100%. Find it and tell me which function."
LLM tool calls:
// 1) validate
validate_trace({ \"etl_path\": \"C:\\traces\\cpu_spike.etl\" })
// 2) analyze focused on CPU
analyze_etl({ \"etl_path\": \"C:\\traces\\cpu_spike.etl\", \"focus\": \"cpu\", \"top_n\": 10 })
// 3) (optional) generate folded stacks for the hottest paths
render_flamegraph({
\"out_dir\": \"C:\\traces\\cpu_spike_wpa_out\\cpu\",
\"top_n\": 200,
\"min_weight_ms\": 5
})
Typical summary returned:
{
\"cpu_top_processes\": [
{ \"process\": \"myapp.exe\", \"weight_ms\": 38210.0 },
{ \"process\": \"chrome.exe\", \"weight_ms\": 2100.0 }
],
\"cpu_top_modules\": [
{ \"module\": \"myapp.exe\", \"weight_ms\": 31020.0 },
{ \"module\": \"ucrtbase.dll\", \"weight_ms\": 4100.0 }
],
\"cpu_hot_stacks\": [
{ \"stack\": \"ntdll!RtlUserThreadStart;kernel32!BaseThreadInitThunk;myapp!worker_loop;myapp!compress_chunk\",
\"weight_ms\": 27800.0 }
]
}
What the LLM should conclude:
myapp.exe consumed ~38 s of CPU ≈ ~90% of wall time on a single core equivalent.myapp!compress_chunk via worker_loop → this is a hot loop in the app itself, not the runtime or OS.compress_chunk, consider SIMD / chunking, and capture a second trace with -start CPU -start FileIO to see if the loop is also I/O-bound.User: "When the user clicks Save, the window freezes for ~2 seconds. Trace at C:\traces\save_hang.etl."
LLM tool calls:
validate_trace({ \"etl_path\": \"C:\\traces\\save_hang.etl\" })
analyze_etl({ \"etl_path\": \"C:\\traces\\save_hang.etl\", \"focus\": \"ui\" })
analyze_etl({ \"etl_path\": \"C:\\traces\\save_hang.etl\", \"focus\": \"latency\" })
Expected signals:
{
\"ui_focus_top\": [{ \"process\": \"myapp.exe\", \"focus_ms\": 5400 }],
\"dwm_slow_frames\": { \"count\": 42, \"p95_ms\": 58.0, \"max_ms\": 1980.0 },
\"ready_latency_top\": [
{ \"process\": \"myapp.exe\", \"tid\": 7321, \"p95_ms\": 14.0 }
]
}
Reasoning rubric the LLM should apply:
dwm_slow_frames.max_ms ≈ 1980 while the app is in focus → the hang is real and UI-thread-bound.ready_latency_top for the UI thread is low (≈14 ms p95) → the thread wasn't starved for CPU; it was busy on its own work (otherwise we'd see hundreds of ms of ready-wait).-start CPU -start FileIO -start DiskIO and focus on the UI thread's stacks with render_flamegraph to see exactly what it's doing during those 2 seconds.Contrast: if
ready_latency_tophad been, say,p95_ms: 900— then the UI thread was ready but not scheduled. That's a scheduling/contention story, not a "heavy work on UI thread" story.
User: "Audio crackles and the mouse skips for a split second every few minutes. C:\traces\glitch.etl."
LLM tool calls:
validate_trace({ \"etl_path\": \"C:\\traces\\glitch.etl\" })
analyze_audio_glitch({ \"etl_path\": \"C:\\traces\\glitch.etl\", \"dispatch_threshold_us\": 1000 })
Expected signal:
{
\"dpc_isr_top\": [
{ \"driver\": \"Netwtw10.sys\", \"total_ms\": 312.4, \"max_us\": 4120, \"count\": 1820 },
{ \"driver\": \"ndis.sys\", \"total_ms\": 95.1, \"max_us\": 820, \"count\": 4300 },
{ \"driver\": \"nvlddmkm.sys\", \"total_ms\": 60.0, \"max_us\": 410, \"count\": 2100 }
]
}
What the LLM should conclude:
Netwtw10.sys (Intel Wi-Fi driver) has a single DPC over 4 ms — that's well above the ~1 ms "don't cause audio glitches" rule of thumb.Quality rules wpa-mcp's prompting guide bakes in: any driver with
max_us > 1000is suspicious,>= 500worth mentioning.
After analyze_etl with focus="cpu", you can ask the LLM to drill deeper:
render_flamegraph({
\"out_dir\": \"C:\\traces\\cpu_spike_wpa_out\\cpu\",
\"output_path\": \"C:\\traces\\cpu_spike.folded\",
\"top_n\": 300,
\"min_weight_ms\": 2
})
Returns:
{
\"folded_file\": \"C:\\traces\\cpu_spike.folded\",
\"source_csv\": \"C:\\traces\\cpu_spike_wpa_out\\cpu\\CPU Usage (Sampled)_....csv\",
\"line_count\": 287,
\"total_weight_ms\": 39120.0,
\"preview\": \"ntdll!RtlUserThreadStart;kernel32!BaseThreadInitThunk;myapp!worker_loop;myapp!compress_chunk 27800\\nntdll!... ; myapp!parse_header 410\\n...\"
}
You can now either:
Render an SVG flamegraph (requires Perl + Brendan Gregg's script):
flamegraph.pl C:\traces\cpu_spike.folded > C:\traces\cpu_spike.svg
Or just let the LLM read the preview — the folded format is already much easier for an LLM than raw CSV.
%APPDATA%\\Claude\\claude_desktop_config.json{
\"mcpServers\": {
\"wpa\": {
\"command\": \"wpa-mcp\",
\"env\": {
\"WPAEXPORTER_PATH\": \"C:/Program Files (x86)/Windows Kits/10/Windows Performance Toolkit/wpaexporter.exe\",
\"XPERF_PATH\": \"C:/Program Files (x86)/Windows Kits/10/Windows Performance Toolkit/xperf.exe\"
}
}
}
}
.vscode/mcp.jsonAlready included in this repo. It points at server.py in the workspace.
Any MCP client that speaks stdio works. Launch wpa-mcp (or python server.py) as a child process and send tools/list + tools/call over stdio.
This repo publishes to PyPI via GitHub Actions + PyPI trusted publishing (OIDC) — no secrets required.
One-time PyPI setup:
wpa-mcp project on PyPI.Jialong-zhongwpr-xperf-mcp-serverpublish.ymlpypiThen, to ship a new version:
# bump version in pyproject.toml, commit, then:
git tag v0.2.0
git push origin v0.2.0
The Publish to PyPI workflow (on tag v*) will build the sdist + wheel and publish automatically.
| Symptom | Likely cause | Fix |
|---|---|---|
wpaexporter not found |
WPT not installed or path wrong | Install Windows Performance Toolkit; set WPAEXPORTER_PATH |
xperf stats failed |
ETL corrupted or not a WPR trace | Re-capture; ensure you ran wpr -stop <file> successfully |
columns missing in summarizer |
Your WPA version renamed columns | Open the corresponding .wpaProfile and adjust <Column Name=...> to match your WPA |
has_stacks: false in validation |
-start CPU not used during capture, or no admin |
Re-capture with -start CPU as Administrator |
Empty dwm_slow_frames |
DesktopComposition profile wasn't enabled |
Re-capture with -start DesktopComposition |
ready_latency_top all near zero during a hang |
The thread isn't ready-waiting → it's doing work | Run render_flamegraph on CPU exports to see what work |
Q: Does this need WPA GUI installed?
No. Only wpaexporter.exe and xperf.exe (both from the Windows Performance Toolkit) are called. WPA GUI never launches.
Q: Can I use this on Linux/macOS?
The MCP server itself is pure Python. But wpaexporter / xperf only exist on Windows, so analysis must run on Windows. A common setup is: capture on Windows, copy ETL to a Windows analysis box, run wpa-mcp there.
Q: Why not parse ETL directly in Python?
ETL parsing is deep. Microsoft already ships an excellent, correct parser (wpaexporter) that understands every kernel + provider schema. Reusing it is cheaper and more accurate than reimplementing.
Q: Can I add my own WPA profile?
Yes. Drop a .wpaProfile into wpa/profiles/, add a key to PROFILE_MAP in server.py, and (optionally) a summarizer in wpa/summarizer.py.
Q: Does the LLM see the full CSV? No — by design. The LLM sees compact summary JSON plus (optionally) folded-stack text. Raw CSVs stay on disk and are referenced by path.
MIT. See LICENSE.
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"wpa-mcp": {
"command": "npx",
"args": []
}
}
}