loading…
Search for a command to run...
loading…
A comprehensive Model Context Protocol (MCP) server for Grafana, Prometheus, Kafka UI, and Datadog. Features a secure "Bring Your Own Key" (BYOK) architecture w
A comprehensive Model Context Protocol (MCP) server for Grafana, Prometheus, Kafka UI, and Datadog. Features a secure "Bring Your Own Key" (BYOK) architecture where credentials stay local. Provides tools for metrics querying, dashboard inspection, Kafka lag monitoring, and unified health checks.
Query your observability stack from Claude Code, Codex, or any MCP client and no data leaves your machine.
npm version License: MIT Awesome MCP Servers Node ≥18 alimuratkuslu/byok-observability-mcp MCP server
Bring Your Own Keys — credentials stay in env vars on your machine. No clone, no build, runs via npx.
Partial setup — configure only the backends you use. Tools for unconfigured backends are never exposed.
Quick Start · Tools · Credentials · Configuration · Scheduled Reports · Examples · Security · Development
🤖 Claude Code / Codex CLI
│
▼
⚡ byok-observability-mcp
(Local npx process)
│
🔒 env vars never leave your machine
│
▼
┌─────────────────────────────────┐
│ 📊 Grafana 🔥 Prometheus │
│ 🛶 Kafka UI 🐶 Datadog │
└─────────────────────────────────┘
Run once, answer a few questions, get a ready-made .mcp.json:
npx byok-observability-mcp --init
The wizard will:
.mcp.json to your project root or ~/.claude/ — your choiceThen just start Claude Code:
claude
[!TIP] That's it. No clone, no build, no env file. Works in under 60 seconds.
.mcp.jsonCreate .mcp.json in your project root. Include only the backends you need.
{
"mcpServers": {
"observability-mcp": {
"command": "npx",
"args": ["-y", "byok-observability-mcp"],
"env": {
"GRAFANA_URL": "https://grafana.mycompany.internal",
"GRAFANA_TOKEN": "glsa_...",
"PROMETHEUS_URL": "https://prometheus.mycompany.internal",
"KAFKA_UI_URL": "https://kafka-ui.mycompany.internal",
"DD_API_KEY": "your-datadog-api-key",
"DD_APP_KEY": "your-datadog-app-key"
}
}
}
}
Credentials in git? Use the
${VAR}approach instead — see Configuration → Method B.
Start Claude Code:
claude
Claude Code reads .mcp.json automatically. No claude mcp add, no build step.
Verify by asking Claude:
What observability tools do you have available?
| Client | Configuration |
|---|---|
| Claude Code | .mcp.json in project root (recommended) or claude mcp add CLI |
| OpenAI Codex CLI | .mcp.json in project root — same format as Claude Code |
Both clients read .mcp.json automatically. The Quick Start above works for either.
# Same .mcp.json as above works out of the box
codex
Or add via CLI:
codex mcp add --transport stdio observability-mcp -- npx -y byok-observability-mcp
Always available. Checks connectivity across all configured backends.
| Tool | Description |
|---|---|
obs_health_check |
Unified Health Check. Runs a parallel check on all backends and returns a status table. |
Enabled when
GRAFANA_URL+GRAFANA_TOKENare set.
| Tool | Description |
|---|---|
grafana_health |
Check connectivity, version, and database status |
grafana_list_datasources |
List all datasources (name, type, UID) |
grafana_query_metrics |
Run a PromQL expression via a Grafana datasource |
grafana_list_dashboards |
Search and list dashboards by name or tag |
grafana_get_dashboard |
Get panels and metadata for a dashboard by UID |
grafana_list_alerts |
List active alerts from Alertmanager (firing/pending) |
grafana_get_alert_rules |
List all configured alert rules across all folders |
Enabled when
PROMETHEUS_URLis set.
| Tool | Description |
|---|---|
prometheus_health |
Check connectivity |
prometheus_query |
Instant PromQL query — current value of a metric |
prometheus_query_range |
Range PromQL query — metric values over time |
prometheus_list_metrics |
List all available metric names |
prometheus_metric_metadata |
Get help text and type for a specific metric |
Enabled when
KAFKA_UI_URLis set.
| Tool | Description |
|---|---|
kafka_list_clusters |
List configured Kafka clusters and their status |
kafka_list_topics |
List topics in a cluster |
kafka_describe_topic |
Get partition count, replication factor, and config |
kafka_list_consumer_groups |
List consumer groups and their state |
kafka_consumer_group_lag |
Get per-partition lag for a consumer group |
kafka_broker_health |
Broker count and disk usage per broker |
Enabled when both
DD_API_KEYandDD_APP_KEYare set. Proxies the official Datadog MCP server.
Default toolsets: core, apm, alerting. Set DD_TOOLSETS=all to load everything.
| Toolset | Covers |
|---|---|
core |
Metrics, dashboards, monitors, infrastructure |
apm |
APM services, traces, service map |
alerting |
Monitors, downtimes, alerts |
logs |
Log search and analytics |
incidents |
Incident management |
ddsql |
SQL-style metric queries |
security |
Cloud security posture |
synthetics |
Synthetic test results |
networks |
Network performance monitoring |
dbm |
Database monitoring |
software-delivery |
CI/CD pipelines |
llm-obs |
LLM observability |
cases |
Case management |
feature-flags |
Feature flag tracking |
Viewer → Createglsa_) — you won't see it againGRAFANA_URL=https://grafana.mycompany.internal
GRAFANA_TOKEN=glsa_xxxxxxxxxxxxxxxx
If your Grafana uses a self-signed certificate:
GRAFANA_VERIFY_SSL=false
If Prometheus has no authentication:
PROMETHEUS_URL=https://prometheus.mycompany.internal
If Prometheus uses basic auth:
PROMETHEUS_URL=https://prometheus.mycompany.internal
PROMETHEUS_USERNAME=your-username
PROMETHEUS_PASSWORD=your-password
If Kafka UI has no authentication:
KAFKA_UI_URL=https://kafka-ui.mycompany.internal
If Kafka UI requires a login:
KAFKA_UI_URL=https://kafka-ui.mycompany.internal
KAFKA_UI_USERNAME=admin
KAFKA_UI_PASSWORD=your-password
API key: Datadog → Organization Settings → API Keys → New Key
Application key: Datadog → Organization Settings → Application Keys → New Key
DD_SITE — match your Datadog login URL:
| Login URL | DD_SITE |
|---|---|
app.datadoghq.com |
datadoghq.com (default) |
app.us3.datadoghq.com |
us3.datadoghq.com |
app.us5.datadoghq.com |
us5.datadoghq.com |
app.datadoghq.eu |
datadoghq.eu |
app.ap1.datadoghq.com |
ap1.datadoghq.com |
DD_API_KEY=your-api-key
DD_APP_KEY=your-application-key
DD_SITE=datadoghq.com
DD_TOOLSETS=core,apm,alerting
.mcp.json (simplest)Put credentials directly in .mcp.json. Works everywhere, no extra steps.
Add .mcp.json to your .gitignore if the repo is shared.
Use ${VAR} placeholders in .mcp.json and put real values in .env.
.mcp.json (safe to commit — contains no secrets):
{
"mcpServers": {
"observability-mcp": {
"command": "npx",
"args": ["-y", "byok-observability-mcp"],
"env": {
"GRAFANA_URL": "${GRAFANA_URL}",
"GRAFANA_TOKEN": "${GRAFANA_TOKEN}",
"PROMETHEUS_URL": "${PROMETHEUS_URL}",
"KAFKA_UI_URL": "${KAFKA_UI_URL}",
"DD_API_KEY": "${DD_API_KEY}",
"DD_APP_KEY": "${DD_APP_KEY}"
}
}
}
}
.env (add to .gitignore):
GRAFANA_URL=https://grafana.mycompany.internal
GRAFANA_TOKEN=glsa_...
Start Claude with the env loaded:
set -a && source .env && set +a && claude
A ready-made helper script is included:
./scripts/run-claude-with-env.sh
A template .mcp.json with all variables is available as .mcp.json.example.
Add to ~/.claude.json:
{
"mcpServers": {
"observability-mcp": {
"command": "npx",
"args": ["-y", "byok-observability-mcp"],
"env": {
"GRAFANA_URL": "https://grafana.mycompany.internal",
"GRAFANA_TOKEN": "glsa_..."
}
}
}
}
| Variable | Backend | Required | Description |
|---|---|---|---|
GRAFANA_URL |
Grafana | ✅ | Base URL of your Grafana instance |
GRAFANA_TOKEN |
Grafana | ✅ | Service account token (Viewer role) |
GRAFANA_VERIFY_SSL |
Grafana | Set to false to skip TLS verification |
|
PROMETHEUS_URL |
Prometheus | ✅ | Base URL of your Prometheus instance |
PROMETHEUS_USERNAME |
Prometheus | Basic auth username | |
PROMETHEUS_PASSWORD |
Prometheus | Basic auth password | |
KAFKA_UI_URL |
Kafka UI | ✅ | Base URL of your Kafka UI instance |
KAFKA_UI_USERNAME |
Kafka UI | Login username | |
KAFKA_UI_PASSWORD |
Kafka UI | Login password | |
DD_API_KEY |
Datadog | ✅ | Datadog API key |
DD_APP_KEY |
Datadog | ✅ | Datadog Application key |
DD_SITE |
Datadog | Datadog site (default: datadoghq.com) |
|
DD_TOOLSETS |
Datadog | Tool groups to load (default: core,apm,alerting) |
|
SLACK_WEBHOOK_URL |
Reports | ✅* | Slack Incoming Webhook URL for scheduled reports |
REPORT_BACKENDS |
Reports | Comma-separated backends to include in reports (default: all configured) |
Send an automated observability digest to Slack on a schedule — no Claude or Codex instance needs to be running.
cron / launchd
│ fires every N minutes
▼
npx byok-observability-mcp --report
│
│ reads env vars, connects directly to backends
▼
Grafana · Prometheus · Kafka UI
│
│ categorizes findings → P0 / P1 / P2 / P3
▼
Slack Incoming Webhook → #your-channel
The command collects data, categorizes every finding by severity, formats a Slack message, sends it, and exits. It is completely stateless.
| Level | Meaning | Examples |
|---|---|---|
| 🔴 P0 — KRİTİK | Service down or unreachable | Grafana alert firing (critical), Kafka cluster offline, backend unreachable |
| 🟠 P1 — YÜKSEK | Degraded, action needed soon | Grafana alert firing (non-critical), Kafka consumer lag > 10 000 |
| 🟡 P2 — ORTA | Warning, monitor closely | Grafana alert pending, Kafka consumer lag > 1 000 |
| 🟢 P3 — BİLGİ | Informational, all normal | Healthy backends, silenced alerts |
Step 1 — Get a Slack Incoming Webhook URL
Step 2 — Set environment variables
export SLACK_WEBHOOK_URL=https://hooks.slack.com/services/XXX/YYY/ZZZ
# Optional: restrict which backends are included (default: all configured)
export REPORT_BACKENDS=grafana,prometheus,kafka
Step 3 — Run a one-off report to verify
npx byok-observability-mcp --report
You should see a message in your Slack channel within seconds.
Step 4 — Schedule with cron
Open your crontab:
crontab -e
Add a line. Examples:
# Every hour at minute 0
0 * * * * SLACK_WEBHOOK_URL=https://hooks.slack.com/... GRAFANA_URL=... GRAFANA_TOKEN=... npx byok-observability-mcp --report >> /tmp/obs-report.log 2>&1
# Every 30 minutes
*/30 * * * * SLACK_WEBHOOK_URL=https://hooks.slack.com/... npx byok-observability-mcp --report >> /tmp/obs-report.log 2>&1
[!TIP] Put all env vars in a
.envfile and source it inside the cron command to keep the crontab clean:0 * * * * bash -c 'source /path/to/.env && npx byok-observability-mcp --report' >> /tmp/obs-report.log 2>&1
Alternative: macOS launchd (runs on login, survives reboots)
Create ~/Library/LaunchAgents/com.observability-mcp.report.plist:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.observability-mcp.report</string>
<key>ProgramArguments</key>
<array>
<string>/usr/local/bin/npx</string>
<string>byok-observability-mcp</string>
<string>--report</string>
</array>
<key>EnvironmentVariables</key>
<dict>
<key>SLACK_WEBHOOK_URL</key>
<string>https://hooks.slack.com/services/XXX/YYY/ZZZ</string>
<key>GRAFANA_URL</key>
<string>https://grafana.mycompany.internal</string>
<key>GRAFANA_TOKEN</key>
<string>glsa_...</string>
</dict>
<key>StartInterval</key>
<integer>3600</integer>
<key>StandardOutPath</key>
<string>/tmp/obs-report.log</string>
<key>StandardErrorPath</key>
<string>/tmp/obs-report.log</string>
</dict>
</plist>
Load it:
launchctl load ~/Library/LaunchAgents/com.observability-mcp.report.plist
To stop: launchctl unload ~/Library/LaunchAgents/com.observability-mcp.report.plist
| Backend | Try asking Claude... |
|---|---|
| Grafana | "List all datasources and tell me which ones are Prometheus type." |
| Grafana | "Search for dashboards related to 'kubernetes' — list names and UIDs." |
| Grafana | "Query http_requests_total rate over the last hour via the default Prometheus datasource." |
| Prometheus | "What is the current value of the up metric? Which targets are down?" |
| Prometheus | "Show CPU usage (node_cpu_seconds_total rate) over the past hour, by instance." |
| Prometheus | "List all available metrics that start with http_." |
| Kafka UI | "List all Kafka clusters. Are there any with offline brokers?" |
| Kafka UI | "Describe the topic 'orders' in cluster 'production' — partitions and replication factor?" |
| Kafka UI | "Check consumer lag for group 'order-processor'. Which partitions have the highest lag?" |
| Datadog | "List all Datadog monitors currently in Alert state." |
| Datadog | "Show APM service performance for the past hour. Which services have the highest error rate?" |
| Datadog | "Query aws.ec2.cpuutilization for the last 30 minutes. Which hosts are above 80%?" |
| Goal | Try asking Claude... |
|---|---|
| Health | "Run a health check on all systems." |
| Alerts | "Are there any firing alerts in Grafana right now?" |
| Triage | "Show me the alert rules for the 'Production' folder." |
Check the health of all configured observability backends and give me a summary.
I'm seeing high error rates. Check Prometheus for http_requests_total with status=500,
then look for related Datadog monitors that might be alerting.
[!NOTE] All tools are read-only. No write operations are performed on any backend.
[!IMPORTANT] Credentials are read from environment variables and never logged or sent to Anthropic. Tokens are redacted in all error messages.
Least-privilege recommendations:
| Backend | Recommended role |
|---|---|
| Grafana | Service account with Viewer role |
| Prometheus | Network-level read-only access |
| Kafka UI | Read-only UI user |
| Datadog | API key + Application key with read scopes |
git clone https://github.com/alimuratkuslu/byok-observability-mcp
cd byok-observability-mcp
npm install
npm run dev # run with tsx (no build step)
npm run build # compile to dist/
npm run typecheck # TypeScript check without emitting
| Backend | Tested version |
|---|---|
| Grafana | v9.x, v10.x, v11.x |
| Prometheus | v2.x |
| Kafka UI | provectus/kafka-ui:v0.7.2 |
MIT
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"byok-observability-mcp": {
"command": "npx",
"args": []
}
}
}Extract design specs and assets
An Open-Sourced UI to install and manage MCP servers for Windows, Linux and macOS.
Build, validate, and deploy multi-agent AI solutions on the ADAS platform. Design skills with tools, manage solution lifecycle, and connect from any AI environm
MCP Bundles: Create custom bundles of tools and connect providers with OAuth or API keys. Use one MCP server across thousands of integrations, with programmatic