AIOps

FreeNot checked

A multi-agent MCP server that turns LLMs into an autonomous incident-response copilot, enabling rapid investigation, correlation, and remediation of production

by Elvisaryan

GitHub Embed

About

A multi-agent MCP server that turns LLMs into an autonomous incident-response copilot, enabling rapid investigation, correlation, and remediation of production incidents.

README

Production incidents in 10 seconds, not 60 minutes. A drop-in MCP server + dashboard that turns any LLM — Claude, Claude Code, ChatGPT, Cursor, Continue — into an autonomous incident-response copilot.

MCP Compatible Claude Code Claude Desktop ChatGPT Cursor License: MIT Python

Why AIOps MCP?

Every production incident starts the same way: an engineer opens five tabs at 2 a.m. — CloudWatch, Grafana, GitLab, Confluence, the customer DB — and spends 40-60 minutes gathering context before they can even begin fixing the problem. That hour costs $1,000-$10,000/minute in lost revenue for a P1.

We built AIOps MCP for engineers who are tired of being the human glue between observability tools. It treats incident investigation the way Slack treats messaging or k8s treats containers — as something the platform should handle, not a thing humans should do by hand. Inspired by the way Resolve.ai and pager-replacement tooling are reshaping on-call, but built MCP-native so it speaks the same protocol every modern LLM client already speaks.

Under the hood: six specialized agents, an LLM-driven supervisor, an opinionated synthesis prompt, and a topology engine that knows what depends on what.

What You Get

Capability	Description
🤖 6 specialized agents	Log, Infra, Change, Docs, Impact, Audit — run in parallel, not sequence
🧠 MCP-native	Plug into Claude Desktop, Claude Code, Cursor, Continue, or any MCP client over stdio or HTTP
🔌 Multi-LLM	Claude, GPT, Gemini, local models via OpenRouter — pick your brain, we coordinate
📊 MCP Dashboard	Chat + live agent traces + topology + log viewer in one tab — like Claude.ai for incidents
🕸️ App topology	Interactive service graph with blast-radius propagation for connected-impact analysis
📎 Manual + auto logs	Paste, upload, or auto-pull from CloudWatch / Datadog / Splunk / Loki / Grafana
🧾 Full audit trail	Every agent step, LLM prompt, and one-click action logged — compliance-ready
🎫 Auto-Jira	Incident, RCA, evidence, action log — created and updated by the Audit Agent
🚀 One-click actions	Rollback / restart / scale / flag-flip — vetted, parameterized, reversible
⚙️ 8 env vars total	Production deployment with mocks-by-default — no creds, no problem
🐳 Docker-ready	`docker compose up` and you have the full stack
🔐 Zero-trust by default	Per-agent secrets, PII scrubbing on LLM prompts, immutable audit log

Two Installation Paths

	MCP Plugin (recommended for LLM users)	Self-hosted CLI (for SREs/platform teams)
Best for	Solo engineers wiring it into Claude Code / Claude Desktop / Cursor	Teams running AIOps MCP as shared infrastructure
Install	`claude mcp add aiops -- aiops mcp-stdio`	`pip install -e .` then `aiops serve`
Transport	stdio	HTTP + MCP-over-HTTP + dashboard at `:7878`
Config	Single `.env` next to `aiops` binary	`.env` + `configs/topology.yaml` + Docker
Dashboard	Optional (`aiops dashboard`)	Always on at `http://host:7878`
Multi-user	Single user	RBAC via Cognito / Okta / OAuth2

Pick based on the team you're solving for. Both paths use the same agent engine.

Quick Start (60 seconds)

git clone https://github.com/<you>/aiops-mcp.git
cd aiops-mcp
cp .env.example .env          # leave it empty for full mock mode
pip install -e .
aiops serve                   # MCP + HTTP + dashboard on :7878

Open http://localhost:7878 and ask: "Why is checkout slow?"

Or just Docker

docker compose up

The Six Agents

Grouped by what they actually do in an incident:

Observe (data gatherers)

Agent	Sources	What it answers
🪵 Log Agent	CloudWatch, Datadog, Splunk, ELK, Loki	"What errors fired in the last 30 min?"
📊 Infra Agent	Grafana, Prometheus, Datadog Metrics, CloudWatch	"Is the DB at 98% connections? Is upstream healthy?"
🚢 Change Agent	GitHub, GitLab, ArgoCD, Jenkins	"Who deployed what, when?"

Reason (context + impact)

Agent	Sources	What it answers
📚 Docs Agent	Bedrock KB / pgvector / Pinecone over runbooks, postmortems, ADRs	"Have we seen this before? What's the runbook?"
💸 Impact Agent	DynamoDB, Snowflake, BigQuery, Mixpanel	"Who's affected? How much revenue is at risk?"

Act (close the loop)

Agent	Sources	What it answers
🧾 Audit Agent	Jira, ServiceNow, Linear	"Create the ticket, attach the RCA, link past incidents."

MCP Tools Exposed

Tool	Purpose
`investigate_incident`	Full multi-agent investigation — returns RCA + suggested actions
`query_logs`	Search logs in CloudWatch / Datadog / Splunk / Loki / ELK
`query_metrics`	PromQL / Grafana / Datadog Metrics query
`attach_log`	Manually attach a log blob (paste or upload) to an active investigation
`get_topology`	Return service dependency graph + health
`correlate_impact`	Given a service, list downstream impact + affected customers
`recent_deploys`	List deploys / merges in a window
`find_runbook`	RAG search over runbooks and past postmortems
`create_jira_ticket`	Create / update Jira with full RCA
`execute_action`	One-click remediation (rollback / restart / scale / flag-flip)

Every tool is callable directly from your LLM client — no UI required.

The MCP Dashboard

A single-tab web UI inspired by Resolve.ai and Claude.ai for incident response:

Surface	What it does
💬 Chat panel	Natural-language conversation with the orchestrator
🧩 Agent trace	Live cards showing each agent's progress, findings, and citations
🕸️ Topology graph	Interactive node graph; click a service to see blast radius
📎 Log dropzone	Paste / upload / fetch logs with timestamp alignment
⏱️ Incident timeline	Every step with timestamps, audit-ready
🎯 Action panel	One-click rollback / scale / flag-flip with explicit confirmation

Live demo (self-host): http://localhost:7878 after aiops serve.

Architecture

            ┌──────────────────────────────────────────────────────┐
            │  LLM CLIENT (Claude Code / Desktop / ChatGPT / ...)  │
            └────────────────────────┬─────────────────────────────┘
                                     │  MCP (stdio or HTTP)
                                     ▼
            ┌──────────────────────────────────────────────────────┐
            │              AIOps MCP SERVER  (:7878)               │
            │   ┌──────────────────────────────────────────────┐   │
            │   │            SUPERVISOR ORCHESTRATOR           │   │
            │   │   plans → fans out → synthesizes → audits    │   │
            │   └──┬─────────┬─────────┬────────┬────────┬─────┘   │
            │      ▼         ▼         ▼        ▼        ▼         │
            │   ┌─────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐        │
            │   │ LOG │ │INFRA │ │CHANGE│ │ DOCS │ │IMPACT│        │
            │   └──┬──┘ └──┬───┘ └──┬───┘ └──┬───┘ └──┬───┘        │
            │      │       │        │        │        │            │
            │      ▼       ▼        ▼        ▼        ▼            │
            │   ┌──────────────────────────────────────────┐       │
            │   │   ADAPTERS (mock-by-default, swappable)  │       │
            │   └──────────────────────────────────────────┘       │
            │      │       │        │        │        │            │
            │      ▼       ▼        ▼        ▼        ▼            │
            │   CloudWatch Grafana GitHub  Vector   Snowflake      │
            │   Datadog   Promet. GitLab  pgvector  BigQuery       │
            │   Splunk    Datadog ArgoCD  RunbookKB DynamoDB       │
            │                                                      │
            │                          ▼                           │
            │            ┌─────────────────────────┐               │
            │            │   SYNTHESIS ENGINE      │               │
            │            │   (Claude Opus 4.7)     │               │
            │            └────────────┬────────────┘               │
            │                         ▼                            │
            │            ┌─────────────────────────┐               │
            │            │   AUDIT AGENT → Jira    │               │
            │            └─────────────────────────┘               │
            └──────────────────────────────────────────────────────┘
                                     │
                                     ▼
                  ┌──────────────────────────────────┐
                  │     MCP DASHBOARD (web UI)       │
                  │   Chat · Trace · Topology · Logs │
                  └──────────────────────────────────┘

You pick the model; AIOps MCP handles coordination.

Configuration — ~8 env vars total

All config is via environment variables. Defaults work with mock data so you can run it instantly.

Variable	Required	Purpose
`ANTHROPIC_API_KEY`	for real LLM	Supervisor + Synthesis (Claude Opus 4.7)
`AIOPS_PORT`	no	HTTP / MCP port — default `7878`
`AIOPS_DATA_DIR`	no	SQLite, uploads, topology cache — default `./data`
`AIOPS_MOCK_MODE`	no	Auto-on when no integrations set
`DATADOG_API_KEY` or `SPLUNK_TOKEN`+`SPLUNK_HOST` or AWS creds	optional	Pick the log source you have
`GRAFANA_URL` + `GRAFANA_TOKEN`	optional	Metrics
`GITHUB_TOKEN` or `GITLAB_TOKEN`	optional	Deploys
`JIRA_HOST` + `JIRA_EMAIL` + `JIRA_TOKEN`	optional	Audit ticketing

That's it. See .env.example for the full annotated list.

Plug Into Any LLM Client

Client	Setup	Config file
Claude Desktop	Merge `mcpServers` block into `claude_desktop_config.json`	`configs/claude-desktop.json`
Claude Code	`claude mcp add aiops -- aiops mcp-stdio`	`configs/claude-code.json`
ChatGPT (custom GPT)	Point at `http://your-host:7878/openapi.json`	`configs/chatgpt-openapi-stub.json`
Cursor	Add to `~/.cursor/mcp.json` (same format as Claude Desktop)	`configs/claude-desktop.json`
Continue.dev	Add to `~/.continue/config.json` MCP section	`configs/claude-desktop.json`
Custom / any HTTP client	POST to `:7878/mcp` (JSON-RPC 2.0)	n/a

Every tool the dashboard uses is also callable from the LLM client. The dashboard is just another MCP consumer.

With / Without AIOps MCP

Capability	Without	With AIOps MCP
Time to RCA	40–60 min, 5 tabs	~10 sec, one prompt
Investigation cost	1 engineer-hour per P1	1 LLM call
Documentation	Manual Jira write-up after the fact	Auto-generated mid-incident
Knowledge retention	Lost when the senior leaves	Permanent in RAG corpus
On-call escalation reason	"I don't know who deployed what"	Change agent already answered
Impact estimation	Slack the BI team	Impact agent in 2 seconds
Action execution	SSH, kubectl, prayer	One-click, audited, reversible
Connected-impact view	Mental model in someone's head	Live topology graph

Repository Layout

aiops-mcp/
├── README.md                 # this file
├── .env.example              # annotated env var template
├── pyproject.toml
├── requirements.txt
├── Dockerfile
├── docker-compose.yml
├── server/
│   ├── main.py               # CLI entry: aiops serve | mcp-stdio | dashboard
│   ├── mcp_server.py         # MCP protocol (stdio + HTTP)
│   ├── api.py                # FastAPI HTTP API + dashboard host
│   ├── orchestrator.py       # Supervisor: plans + fans out
│   ├── synthesis.py          # Final LLM correlation call
│   ├── topology.py           # Service graph + impact propagation
│   ├── config.py             # Env loading + mock fallback
│   └── agents/
│       ├── base.py
│       ├── log_agent.py
│       ├── infra_agent.py
│       ├── change_agent.py
│       ├── docs_agent.py
│       ├── impact_agent.py
│       └── audit_agent.py
├── dashboard/
│   └── index.html            # single-page UI (vanilla JS + vis-network)
├── configs/
│   ├── claude-desktop.json
│   ├── claude-code.json
│   ├── chatgpt-openapi-stub.json
│   └── topology.example.yaml
├── docs/
│   ├── INSTALLATION.md
│   ├── INTEGRATIONS.md
│   └── MCP-USAGE.md
└── tests/
    └── test_basic.py

Documentation

When to read	Doc
First-time install on a new host	docs/INSTALLATION.md
Wiring into Claude / ChatGPT / Cursor / Continue / custom	docs/INTEGRATIONS.md
Building your own MCP client against this server	docs/MCP-USAGE.md
Architecture deep-dive (v1 + v2 roadmap)	docs/aiops-architecture.md

License

MIT — see LICENSE. Use it, fork it, run it, ship it.

Support

🐛 Issues / RFCs: GitHub Issues
💬 Discussions: GitHub Discussions
🏢 Enterprise support (multi-region, SLA, custom adapters): open an issue with enterprise label

Built by people who've carried the pager.

from github.com/Elvisaryan/aiops-mcp

Install AIOps in Claude Desktop, Claude Code & Cursor

Recommended · one command, every IDE

unyly install aiops-mcp

Installs into Claude Desktop, Claude Code, Cursor & VS Code — handles npx, uvx and build-from-source repos for you.

First time? Get the CLI: curl -fsSL https://unyly.org/install | sh

Or configure manually

Run in your terminal:

claude mcp add aiops-mcp -- uvx --from git+https://github.com/Elvisaryan/aiops-mcp aiops-mcp

FAQ

Is AIOps MCP free?

Yes, AIOps MCP is free — one-click install via Unyly at no cost.

Does AIOps need an API key?

No, AIOps runs without API keys or environment variables.

Is AIOps hosted or self-hosted?

Self-hosted: the server runs locally on your machine via the install command above.

How do I install AIOps in Claude Desktop, Claude Code or Cursor?

Open AIOps on unyly.org, pick your client tab (Claude Desktop, Claude Code, Cursor) and press Install — the config is generated automatically, no JSON editing.

Related MCPs

Fetch

Web content fetching and conversion for efficient LLM usage.

by Community

AWS KB Retrieval

Retrieval from AWS Knowledge Base using Bedrock Agent Runtime.

by modelcontextprotocol

Spring AI MCP Server

Provides auto-configuration for setting up an MCP server in Spring Boot applications.