loading…
Search for a command to run...
loading…
Enables LLMs to autonomously query AWS CloudWatch Logs and perform structured root-cause analysis via natural language prompts, using MCP tools for log group li
Enables LLMs to autonomously query AWS CloudWatch Logs and perform structured root-cause analysis via natural language prompts, using MCP tools for log group listing and Insights queries.
An end-to-end agentic system where an LLM autonomously authenticates with AWS, queries CloudWatch Logs, and produces structured root-cause analysis — all triggered by a single natural language prompt in Cursor.
┌─────────────────────────────────────────────────────────────┐
│ Developer machine │
│ │
│ Cursor IDE ──── tool calls ────► MCP Server (Python) │
│ │ │ │
│ Claude LLM ◄─── log results ──── boto3 / AWS SDK │
└─────────────────────────────────────────────────────────────┘
│
┌──────────────▼───────────────┐
│ AWS │
│ │
│ IAM user │
│ CloudWatch Logs │
│ Lambda (log generator) │
└──────────────────────────────┘
Flow: You type a prompt in Cursor → Claude sees the available MCP tools → it calls list_log_groups to orient itself → constructs and calls query_logs with a CloudWatch Insights query it writes itself → your MCP server authenticates with AWS and fetches real log data → Claude reads the results and returns a structured diagnosis.
Check my CloudWatch logs for the last 5 hours. List all the log groups
you can see, then query the Lambda log group for any errors and tell
me what went wrong and why.
Log groups found:
- /aws/lambda/mcp-log-generator
Errors in the last 5 hours — two patterns:
1. NullPointerException (field=customer_email)
Scenarios: fetch_inventory, processing_order, user_login
Cause: customer_email is null or missing in some user records.
Fix: Validate at the boundary; use null-safe access on required fields;
backfill missing emails upstream.
2. TimeoutException (latency_ms=5032, threshold_ms=5000)
Scenarios: payment_gateway (6×), processing_order (1×), fetch_inventory (1×)
Cause: downstream dependency consistently 32ms over the 5s cap.
Fix: Tune client timeouts above realistic p99 latency; add circuit breaker;
investigate gateway cold starts and DB contention during traffic spikes.
| Error type | Count | Share |
|---|---|---|
TimeoutException |
8 | 72.7% |
NullPointerException |
3 | 27.3% |
Claude identified 6 distinct affected user_ids with timestamps, extracted directly from raw CloudWatch log events.
git clone https://github.com/eugeneoh04/cloudwatch-mcp.git
cd cloudwatch-mcp
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
In the AWS console, create a user with programmatic access and attach this inline policy (least-privilege, read-only):
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"logs:DescribeLogGroups",
"logs:DescribeLogStreams",
"logs:FilterLogEvents",
"logs:StartQuery",
"logs:GetQueryResults",
"logs:GetLogEvents"
],
"Resource": "*"
}]
}
Save the generated AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
lambda_function.py into the inline editorcp .env.example .env
Fill in your credentials in .env:
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...
AWS_DEFAULT_REGION=...
python test_connection.py
# Expected output: /aws/lambda/mcp-log-generator
Create ~/.cursor/mcp.json:
{
"mcpServers": {
"cloudwatch": {
"command": "/absolute/path/to/venv/bin/python",
"args": ["/absolute/path/to/cloudwatch_mcp_server.py"],
"env": {
"AWS_ACCESS_KEY_ID": "AKIA...",
"AWS_SECRET_ACCESS_KEY": "...",
"AWS_DEFAULT_REGION": "..."
}
}
}
}
Use absolute paths — Cursor does not expand
~.
Open Cursor → Settings → MCP. A green dot next to cloudwatch means the server is connected.
| Tool | Description | Arguments |
|---|---|---|
list_log_groups |
Lists all CloudWatch log groups in the account | none |
query_logs |
Runs a CloudWatch Logs Insights query | log_group (required), query (required), hours_back (optional, default 1) |
What MCP tools do you have available?
Check my CloudWatch logs for the last 2 hours. List all log groups,
then query the Lambda log group for errors and diagnose each one.
Group the errors by type, show how frequently each one occurs,
and suggest a fix for each.
Find all log entries where the payment_gateway scenario failed.
What user_ids were affected and when?
What percentage of invocations succeeded vs failed in the last hour?
Is there any pattern to when errors occur?
cloudwatch-mcp/
├── cloudwatch_mcp_server.py # MCP server — exposes CloudWatch tools to the LLM
├── lambda_function.py # Lambda function that generates structured logs
├── test_connection.py # Quick IAM + boto3 connectivity check
├── requirements.txt
├── .env.example # Credentials template
└── .gitignore
Why MCP over a direct API call? MCP gives the LLM the ability to decide when and how to query. It writes the CloudWatch Insights query itself based on your natural language prompt. A direct API call is static; MCP is agentic.
Why least-privilege IAM? The MCP server only needs read access to logs. This mirrors production best practices — no write permissions, no admin access.
Why CloudWatch Logs Insights over FilterLogEvents?
Insights supports SQL-like aggregations (stats count() by reason) that let the LLM produce quantitative breakdowns and trend analysis, not just raw log dumps.
Python · AWS Lambda · AWS CloudWatch Logs · AWS IAM · boto3 · MCP (Model Context Protocol) · Claude · Cursor
Выполни в терминале:
claude mcp add cloudwatch-log-analyst-mcp -- npx