Langfuse Server

FreeNot checked

Model Context Protocol server for Langfuse observability. Query traces, analyze accuracy, detect failures, track costs, debug latency, manage prompts and datase

by DrishtantKaushal

GitHub Embed

About

Model Context Protocol server for Langfuse observability. Query traces, analyze accuracy, detect failures, track costs, debug latency, manage prompts and datasets.

README

License: MIT Python 3.10+

Model Context Protocol server for Langfuse observability. Query traces, analyze accuracy, detect failures, track costs, debug latency, manage prompts and datasets.

56 tools across data access and analytics. Multi-project support so one instance can serve several Langfuse projects. Works with Claude Code, Codex CLI, Cursor, and any MCP-compatible client.

Why this MCP server?

Comparison with official Langfuse MCP (as of March 2026):

Capability	This server	Official Langfuse MCP
Traces & Observations	Yes	No
Sessions & Users	Yes	No
Exception Tracking	Yes	No
Prompt Management	Yes	Yes
Dataset Management	Yes	No
Annotation Queues	Yes	No
Scores v2 API	Yes	No
Score Write-back	Yes	No
Multi-project support	Yes	No
Accuracy Metrics	Yes	No
Failure Detection	Yes	No
Token Percentiles	Yes	No
Cost Breakdown	Yes	No
Latency Analysis	Yes	No
Session Analytics	Yes	No
Context Breach Scanning	Yes	No
User Group Aggregation	Yes	No

The official MCP focuses on prompt management. This server provides a full observability and analytics toolkit — traces, observations, sessions, scores, exceptions, prompts, datasets, annotation queues, plus 9 built-in analytics tools that compute insights server-side and return LLM-sized summaries. Multi-project routing lets a single instance serve several Langfuse projects behind one connector URL.

Quick Start

1. Get your API keys

Langfuse Cloud: cloud.langfuse.com → Settings → API Keys
Self-hosted: Your Langfuse instance → Settings → API Keys. Set LANGFUSE_HOST to your instance URL (e.g., https://langfuse.yourcompany.com)

2. Add the MCP server

Claude Code

claude mcp add \
  -e LANGFUSE_PUBLIC_KEY=pk-lf-... \
  -e LANGFUSE_SECRET_KEY=sk-lf-... \
  -e LANGFUSE_HOST=https://cloud.langfuse.com \
  --scope project \
  langfuse-mcp -- uvx langfuse-mcp-server

Codex CLI

codex mcp add langfuse-mcp \
  --env LANGFUSE_PUBLIC_KEY=pk-lf-... \
  --env LANGFUSE_SECRET_KEY=sk-lf-... \
  --env LANGFUSE_HOST=https://cloud.langfuse.com \
  -- uvx langfuse-mcp-server

Cursor

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "langfuse-mcp": {
      "command": "uvx",
      "args": ["langfuse-mcp-server"],
      "env": {
        "LANGFUSE_PUBLIC_KEY": "pk-lf-...",
        "LANGFUSE_SECRET_KEY": "sk-lf-...",
        "LANGFUSE_HOST": "https://cloud.langfuse.com"
      }
    }
  }
}

3. Verify

Restart your CLI, then test with /mcp (Claude Code) or codex mcp list (Codex).

Manual install (alternative to uvx)

pip install langfuse-mcp-server
langfuse-mcp-server

Hosting as a remote service

Run as a long-lived HTTP service so multiple users connect to a single instance — required for Claude.ai custom Connectors, and useful for team-wide access without distributing Langfuse API keys per user.

Enabled via env vars; no code changes.

Minimum setup

MCP_TRANSPORT=streamable-http
MCP_BASE_URL=https://mcp.yourcompany.com
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://your-langfuse-instance.example

Without OAuth env vars, the endpoint is unauthenticated — suitable only for local testing. See Google OAuth setup below for production.

Docker

A production-ready Dockerfile is checked into the repo (non-root user, pinned base, .dockerignore to prevent secret leakage). Each tagged release auto-publishes a multi-arch image to GitHub Container Registry via .github/workflows/docker-publish.yml.

Pull the published image:

docker pull ghcr.io/drishtantkaushal/langfusemcp:latest

Or build from source:

docker build -t langfuse-mcp .

Run (all secrets injected via -e, never baked into the image):

docker run -d \
  --name langfuse-mcp \
  --restart unless-stopped \
  -p 8000:8000 \
  -e MCP_TRANSPORT=streamable-http \
  -e MCP_BASE_URL=https://mcp.yourcompany.com \
  -e LANGFUSE_PUBLIC_KEY=pk-lf-... \
  -e LANGFUSE_SECRET_KEY=sk-lf-... \
  -e LANGFUSE_HOST=https://cloud.langfuse.com \
  -e GOOGLE_CLIENT_ID=... \
  -e GOOGLE_CLIENT_SECRET=... \
  -e ALLOWED_EMAIL_DOMAINS=yourcompany.com \
  ghcr.io/drishtantkaushal/langfusemcp:latest

Reverse proxy

Terminate TLS in front (nginx, Caddy, Cloudflare). MCP endpoint is at /mcp/ (trailing slash). Because responses stream, the proxy must:

Disable response buffering — nginx: proxy_buffering off;
Allow read timeout ≥ 5 minutes — some analytics queries legitimately run several minutes
Speak HTTP/1.1 with keepalive upstream

Google OAuth

In your Google Cloud project:

APIs & Services → OAuth consent screen
- User type: Internal (restricts sign-in to your Google Workspace domain)
- Scopes: openid, https://www.googleapis.com/auth/userinfo.email
Credentials → Create OAuth client ID → Web application
- Authorized redirect URI: https://{your-base-url}/auth/callback
- Copy the Client ID and Client Secret

Set:

GOOGLE_CLIENT_ID=....apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=GOCSPX-...

OAuth activates when GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET, and MCP_BASE_URL are all set. With an Internal consent screen, Google rejects non-Workspace sign-ins at the identity layer — the server never sees those attempts.

Optional email allowlist

For narrower control than "anyone in the Workspace":

# either, or both
ALLOWED_EMAIL_DOMAINS=yourcompany.com
[email protected],[email protected]

When set, every tool call verifies the caller's email_verified claim and checks membership before proceeding. When unset, the server trusts whatever the OAuth provider returns.

Adding to Claude.ai

Once hosted at https://mcp.yourcompany.com:

Claude.ai → Settings → Connectors → Add custom connector
Remote MCP server URL: https://mcp.yourcompany.com/mcp/
Leave the OAuth Client ID / Secret fields empty — the server uses Dynamic Client Registration; those fields are for a different deployment pattern.
Click Add → Google sign-in popup → done.

Verifying the deploy

Auth enabled, expect 401:

curl -i -X POST https://mcp.yourcompany.com/mcp/ \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"curl","version":"0"}}}'

OAuth metadata endpoint returns JSON (used by Claude.ai to auto-register):

curl https://mcp.yourcompany.com/.well-known/oauth-authorization-server

Liveness/readiness probe — unauthenticated GET /health returns HTTP 200 with {"status": "ok"}, suitable for Kubernetes probes:

curl -i https://mcp.yourcompany.com/health

Multi-project support

A single server instance can route to multiple Langfuse projects. Every tool accepts an optional project argument; when omitted, the server-configured default is used. Call list_projects to discover what's available.

Configuring projects

Declare each project via indexed env vars. Project names are data, not part of variable names — use whatever scheme you like.

LANGFUSE_PROJECT_1_NAME=production
LANGFUSE_PROJECT_1_PUBLIC_KEY=pk-lf-...
LANGFUSE_PROJECT_1_SECRET_KEY=sk-lf-...
LANGFUSE_PROJECT_1_HOST=https://cloud.langfuse.com

LANGFUSE_PROJECT_2_NAME=staging
LANGFUSE_PROJECT_2_PUBLIC_KEY=pk-lf-...
LANGFUSE_PROJECT_2_SECRET_KEY=sk-lf-...
LANGFUSE_PROJECT_2_HOST=https://cloud.langfuse.com

LANGFUSE_DEFAULT_PROJECT=production

Usage from the client

Claude: "Show me failing traces in production today."
→ fetch_traces(project="production", ...) routed to project 1's credentials.

Claude: "Compare that with staging."
→ fetch_traces(project="staging", ...) routed to project 2's credentials.

Each project has its own cache, rate limiter, and connection pool. Claude.ai sees one connector; users authenticate once via OAuth and can query any configured project within the session.

Single-project (legacy) mode

If LANGFUSE_PROJECT_1_NAME is not set, the server falls back to the legacy LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY / LANGFUSE_HOST vars and registers them as a project called default. Existing deployments keep working without changes.

Configuration

Env Variable	Default	Description
`LANGFUSE_PUBLIC_KEY`	(required)	Langfuse public API key
`LANGFUSE_SECRET_KEY`	(required)	Langfuse secret API key
`LANGFUSE_HOST`	`https://cloud.langfuse.com`	Langfuse instance URL (cloud or self-hosted)
`LANGFUSE_INTERNAL_DOMAINS`	`""`	Comma-separated internal domains to exclude from analytics (e.g., `mycompany.com,test.com`). Applies when using `group_by='domain'`.
`LANGFUSE_MCP_READ_ONLY`	`false`	Disable write operations (`score_traces`, `create_dataset`, etc.)
`LANGFUSE_PAGE_LIMIT`	`100`	Traces per API page
`LANGFUSE_PROJECT_{N}_NAME`	(unset)	Multi-project: name for project N (e.g. `production`). See Multi-project.
`LANGFUSE_PROJECT_{N}_PUBLIC_KEY`	(unset)	Public key for project N.
`LANGFUSE_PROJECT_{N}_SECRET_KEY`	(unset)	Secret key for project N.
`LANGFUSE_PROJECT_{N}_HOST`	`https://cloud.langfuse.com`	Host URL for project N.
`LANGFUSE_DEFAULT_PROJECT`	first configured	Default project name used when a tool call omits `project`.
`MCP_TRANSPORT`	`stdio`	`stdio` or `streamable-http`. HTTP mode listens on a port instead of stdin/stdout. See Hosting.
`MCP_HOST`	`0.0.0.0`	Bind address when `MCP_TRANSPORT=streamable-http`.
`MCP_PORT`	`8000`	Port when `MCP_TRANSPORT=streamable-http`.
`MCP_BASE_URL`	(unset)	Public base URL of the hosted server. Required for Google OAuth.
`GOOGLE_CLIENT_ID`	(unset)	Google OAuth client ID. OAuth activates when all three Google vars are set.
`GOOGLE_CLIENT_SECRET`	(unset)	Google OAuth client secret.
`ALLOWED_EMAILS`	(unset)	Comma-separated emails allowed to call tools. Requires OAuth.
`ALLOWED_EMAIL_DOMAINS`	(unset)	Comma-separated email domains allowed to call tools. Requires OAuth.

Tools

Analytics (9 tools)

Tools that compute insights server-side and return compact summaries. These go beyond raw data access — they aggregate, detect patterns, and compute statistics so the LLM can reason over results without hitting context window limits.

Tool	Description	Key Parameters
`aggregate_by_group`	Aggregate trace metrics by user group. Returns per-group: trace count, unique sessions, unique users, accuracy rate, average latency, total cost.	`group_by` (name/userId/domain/tag), `time_range`, `top_n`, `exclude_internal`
`compute_accuracy`	Compute accuracy from feedback scores. Accuracy = correct / (correct + incorrect). Supports grouping and time bucketing for trend analysis.	`group_by`, `bucket_by` (week/day), `score_name`, `time_range`
`detect_failures`	Detect LLM output quality failures using pattern matching ("unable to", "I can't", etc.) and negative feedback scores. NOT Python exceptions — use `find_exceptions` for those.	`group_by`, `include_examples`, `max_examples`, `time_range`
`compute_token_percentiles`	Compute token usage percentiles (TP50/TP90/TP95/TP99) at trace level. Fetches generation observations for accurate per-trace token counts.	`group_by`, `percentiles`, `time_range`
`detect_context_breaches`	Scan for traces where any single generation exceeds a token threshold. Catches context window overflow causing degraded LLM performance or silent truncation.	`threshold` (default 256000), `check_per_generation`, `time_range`
`analyze_sessions`	Analyze multi-turn session behavior. Returns session count, depth distribution (single vs multi-turn), engagement metrics, and session-level cost/latency.	`group_by`, `time_range`
`estimate_costs`	Compute cost breakdown using Langfuse's built-in `totalCost` field (model-aware, computed by Langfuse). Groups by user, agent, or time bucket.	`group_by`, `bucket_by` (week/day), `time_range`
`analyze_latency`	Analyze latency distribution at trace level and optionally per LLM generation. Identifies which model is the bottleneck.	`group_by`, `percentiles`, `include_per_generation`, `time_range`
`score_traces`	Write scores back to Langfuse. Use after analysis to annotate traces with findings — tag failures for review, mark high-quality traces for dataset creation.	`trace_ids`, `score_name`, `score_value`, `comment`

Data Access (25 tools)

Full Langfuse API coverage for querying and managing your observability data.

Traces

Tool	Description
`fetch_traces`	List traces with filters — user ID, name, tags, time range, ordering. Returns paginated results.
`fetch_trace`	Get a single trace by ID with full details including all observations (spans, generations, events).
`diff_traces`	Compare two traces side-by-side (name, user, latency, cost, tags, release, version).

Observations

Tool	Description
`fetch_observations`	List observations with filters — trace ID, type (GENERATION/SPAN/EVENT), name, time range.
`fetch_observation`	Get a single observation by ID. Returns input/output, token usage, model, latency, and cost.

Sessions

Tool	Description
`fetch_sessions`	List sessions with optional time filters.
`get_session_details`	Get full details of a session including all its traces.
`get_user_sessions`	Get sessions for a specific user. Fetches user's traces and extracts unique sessions.

Errors

Tool	Description
`find_exceptions`	Find observations with error status. For LLM output quality issues, use `detect_failures` instead.
`get_exception_details`	Get full error details for a trace — returns all observations with error status highlighted.
`get_error_count`	Get total error count within a time period.

Scores

Tool	Description
`fetch_scores`	List scores/evaluations with filters — trace ID, score name, time range.
`list_scores_v2`	v2 Scores API with richer filters (session ID, dataset run ID, queue ID, config ID, operator/value, etc.).
`get_score_v2`	Get a single score by ID via the v2 Scores API.

Prompts

Tool	Description
`list_prompts`	List all prompts in the project with optional name filter.
`get_prompt`	Fetch a specific prompt by name, version, or label.
`get_prompt_unresolved`	Fetch a prompt with placeholders/dependencies intact (debugging prompt composition).
`create_text_prompt`	Create a new text prompt version with optional labels and model config.
`create_chat_prompt`	Create a new chat prompt version with message array and optional config.
`update_prompt_labels`	Update labels for a specific prompt version (e.g., promote to "production").

Datasets

Tool	Description
`list_datasets`	List all datasets in the project.
`get_dataset`	Get metadata for a specific dataset.
`list_dataset_items`	List items in a dataset with pagination.
`get_dataset_item`	Get a single dataset item by ID.
`create_dataset`	Create a new dataset with optional description and metadata.
`create_dataset_item`	Create or upsert a dataset item. Supports linking to source traces.
`delete_dataset_item`	Delete a dataset item by ID.

Annotation Queues

Tool	Description
`list_annotation_queues`	List all annotation queues in the project.
`create_annotation_queue`	Create a new annotation queue with attached score configs.
`get_annotation_queue`	Get a queue by ID.
`list_annotation_queue_items`	List items in a queue (optionally filtered by status).
`get_annotation_queue_item`	Get a queue item by ID.
`create_annotation_queue_item`	Add a trace or observation to a queue for review.
`update_annotation_queue_item`	Change a queue item's status (PENDING / COMPLETED).
`delete_annotation_queue_item`	Remove an item from a queue.
`create_annotation_queue_assignment`	Assign a reviewer to a queue.
`delete_annotation_queue_assignment`	Remove a reviewer from a queue.

Metrics

Tool	Description
`get_daily_metrics`	Langfuse's pre-aggregated daily rollup (trace count, cost, tokens per day). Faster than per-trace aggregation for long windows.

Users

Tool	Description
`list_users`	Top users by trace count over a time window (defaults to last 30 days). Wraps the Langfuse `/metrics` query API.

Comments

Tool	Description
`list_comments`	List comments attached to traces/observations/sessions/prompts, with filters.
`get_comment`	Get a single comment by ID.
`create_comment`	Create a markdown comment on a trace/observation/session/prompt.

Models

Tool	Description
`list_models`	List model definitions in Langfuse's models registry (pricing + tokenizer config).
`get_model`	Get a single model definition by ID.

Projects

Tool	Description
`list_projects`	Discovery: returns the list of configured Langfuse projects and the default project.

Schema

Tool	Description
`get_data_schema`	Get the data schema for the Langfuse project — available fields and types for traces, observations, scores, sessions.

Sample Questions

Once connected, ask your AI assistant questions like these:

Agent & Pipeline Health

"Which agents failed the most this week?"
"What's the failure rate by agent name?"
"Which agent has the worst accuracy?"
"Show me the top 5 agents by trace volume"
"Are any agents consistently slower than others?"
"Compare all agents by accuracy, latency, and cost"

Accuracy & Quality

"What's our overall accuracy this week?"
"What's the accuracy trend by week for the last 30 days?"
"Compare accuracy across different agents"
"What's the daily accuracy breakdown?"
"Which users are getting the worst accuracy?"
"What percentage of traces have feedback scores?"

Failures & Debugging

"Show me failure examples from today"
"What are the most common failure patterns?"
"Which users are seeing the most failures?"
"What's the failure rate by agent?"
"Are failures increasing or decreasing this week vs last?"
"Show me traces where the LLM said 'unable to' or 'I can't'"

Token Usage

"What are the P90 and P99 token usage stats?"
"Which agents consume the most tokens?"
"Compare token usage across user groups"
"Are any users hitting unusually high token counts?"

Context Window Breaches

"Are any generations exceeding the 128K context window?"
"Show me traces with token usage above 200K per generation"
"What's the breach severity distribution?"
"Which users trigger the most context window breaches?"

Sessions & Engagement

"What's our multi-turn rate?"
"How deep are sessions on average?"
"Which users have the deepest sessions?"
"How many single-turn vs multi-turn sessions this week?"
"What's the average session cost?"

Cost

"How much are we spending per day this week?"
"What's the weekly cost trend for the last 30 days?"
"Which agent is the most expensive?"
"Which users are costing the most?"
"What's the average cost per trace?"

Latency

"What's the P95 latency?"
"Is latency getting worse over time?"
"Which model is the slowest?"
"Compare latency across agents"
"Show me per-generation latency breakdown by model"
"Which users are experiencing the highest latency?"

Annotation & Write-back

"Score all failing traces from today with 'needs-review'"
"Tag these trace IDs as 'high-quality' for dataset creation"
"Mark trace abc-123 with a score of 0 and comment 'hallucinated output'"

Lookups & Exploration

"Fetch the last 20 traces"
"Show me trace abc-123 with all its observations"
"List sessions for user [email protected]"
"What errors happened in the last 24 hours?"
"How many errors occurred this week?"
"Show me all prompts in the project"
"List all datasets"
"What fields are available on traces and observations?"

Grouping Options

The group_by parameter controls how traces are segmented in analytics tools:

Value	What it groups by	When to use
`name`	Trace/agent name (default)	Compare performance across different agents or pipelines
`userId`	Per-user breakdown	Identify users with issues or high usage
`domain`	Email domain extracted from userId	Multi-tenant apps where users have email-based IDs (e.g., `[email protected]` → `acme.com`)
`tag`	Trace tags	Compare across tagged environments, versions, or experiments

Selective Tool Loading

Load only the tool groups you need to reduce token overhead:

# Only load traces and analytics tools
LANGFUSE_TOOLS=traces,analytics langfuse-mcp-server

# Only load prompts and datasets
LANGFUSE_TOOLS=prompts,datasets langfuse-mcp-server

# In Claude Code
claude mcp add \
  -e LANGFUSE_PUBLIC_KEY=pk-lf-... \
  -e LANGFUSE_SECRET_KEY=sk-lf-... \
  -e LANGFUSE_TOOLS=traces,observations,analytics \
  langfuse-mcp -- uvx langfuse-mcp-server

Available groups:

Group	Tools	Count
`traces`	`fetch_traces`, `fetch_trace`	2
`observations`	`fetch_observations`, `fetch_observation`	2
`sessions`	`fetch_sessions`, `get_session_details`, `get_user_sessions`	3
`errors`	`find_exceptions`, `get_exception_details`, `get_error_count`	3
`scores`	`fetch_scores`	1
`prompts`	`list_prompts`, `get_prompt`, `create_text_prompt`, `create_chat_prompt`, `update_prompt_labels`	5
`datasets`	`list_datasets`, `get_dataset`, `list_dataset_items`, `get_dataset_item`, `create_dataset`, `create_dataset_item`, `delete_dataset_item`	7
`annotation_queues`	All 10 annotation queue tools	10
`metrics`	`get_daily_metrics`	1
`users`	`list_users`	1
`comments`	`list_comments`, `get_comment`, `create_comment`	3
`models`	`list_models`, `get_model`	2
`projects`	`list_projects`	1
`schema`	`get_data_schema`	1
`analytics`	All 9 analytics tools	9

If LANGFUSE_TOOLS is not set, all 56 tools are loaded.

Read-Only Mode

Disable write operations (score_traces, create_dataset, create_dataset_item, delete_dataset_item, create_text_prompt, create_chat_prompt):

LANGFUSE_MCP_READ_ONLY=true

How it Compares

vs Official Langfuse MCP

Capability	This server	Official Langfuse MCP
Traces & Observations	Yes	No
Sessions & Users	Yes	No
Exception Tracking	Yes	No
Prompt Management	Yes	Yes
Dataset Management	Yes	No
Score Write-back	Yes	No
Selective Tool Loading	Yes	No
Accuracy Metrics	Yes	No
Failure Detection	Yes	No
Token Percentiles	Yes	No
Cost Breakdown	Yes	No
Latency Analysis	Yes	No
Session Analytics	Yes	No
Context Breach Scanning	Yes	No
User Group Aggregation	Yes	No

The official Langfuse MCP (5 tools) focuses on prompt management. This server provides full observability coverage plus 9 analytics tools.

vs Other Langfuse MCP Implementations

Capability	This server	Others
Data access (traces, observations, sessions)	Yes	Yes
Prompt & dataset management	Yes	Yes
Exception tracking	Yes	Yes
Annotation queues	Yes	Partial
Selective tool loading	Yes	Yes
Multi-project support	Yes	No
Accuracy metrics	Yes	No
LLM failure detection	Yes	No
Token percentiles (TP50/P90/P95/P99)	Yes	No
Cost breakdown by group/time	Yes	No
Latency analysis with per-model breakdown	Yes	No
Multi-turn session analytics	Yes	No
Context window breach scanning	Yes	No
User/tenant group aggregation	Yes	No
Score write-back	Yes	No

Other implementations provide data access (fetching raw traces, observations, sessions) using synchronous HTTP clients. This server adds a compute layer — analytics tools that aggregate, detect patterns, and compute statistics server-side — plus an async architecture that's fundamentally faster.

Architecture	This server	Others
Async HTTP client	Yes (httpx.AsyncClient)	No (sync requests/httpx)
Concurrent observation fetching	Yes (asyncio.gather)	No (sequential per-trace)
TTL caching	Yes (live 5min, historical 1hr)	No
Adaptive rate limiting	Yes (token bucket, 429 backoff)	No (fixed sleep)
Batch observation queries	Yes (with auto-fallback)	No (N+1 per-trace)
Claude Code sub-agent	Yes (.claude/agents/)	No

vs Platform-Embedded AI (Braintrust Loop, LangSmith Insights, Arize Alyx)

Capability	This server	Platform AI assistants
Open source	Yes	No
Works with any MCP client	Yes	Platform-locked
Self-hosted Langfuse support	Yes	N/A
Real-time conversational	Yes	Varies (some batch-only)
Custom grouping/segmentation	Yes	Limited
Write-back to Langfuse	Yes	Platform-specific
Free	Yes	Paid tiers

Architecture

Why async httpx instead of the Langfuse SDK?

The Langfuse Python SDK is excellent for writing traces (it batches and sends asynchronously in the background). But for reading traces at scale — which is what an analytics MCP server does — the SDK has a limitation: its read API is synchronous, built on the requests library.

This server uses httpx.AsyncClient instead, which enables:

Concurrent observation fetching — fetch observations for 100 traces simultaneously via asyncio.gather, not one-by-one
Non-blocking pagination — paginate through thousands of traces without blocking the event loop
Rate-limited concurrency — asyncio.Semaphore + token bucket controls throughput without time.sleep() blocking

Measured impact: analyze_latency with per-generation breakdown dropped from 110s to 20s (5.4x faster) on a self-hosted instance with 2.4M daily observations.

Caching strategy

Two-tier in-memory TTL cache using cachetools.TTLCache:

Data age	TTL	Rationale
Today's data	5 minutes	Still changing, short cache
Historical data (before today)	1 hour	Won't change, cache aggressively

The cache operates at the API page level. If you call aggregate_by_group then compute_accuracy for the same time range, the second call hits cache for all trace pages — only scores are fetched fresh.

Configure via LANGFUSE_CACHE_TTL and LANGFUSE_CACHE_TTL_HISTORICAL (seconds).

Rate limiting

A global token bucket rate limiter respects Langfuse API limits:

Instance type	Default RPM	Behavior
Self-hosted	Unlimited (0)	No artificial throttling. Full speed, limited only by your server.
Langfuse Cloud (Hobby)	30 req/min	Conservative default for Hobby tier
Langfuse Cloud (Pro/Team)	Set `LANGFUSE_RATE_LIMIT_RPM=1000`	Higher throughput for paid plans

On HTTP 429 responses, the limiter automatically halves the RPM and reads the Retry-After header. This means the server adapts to any rate limit — cloud or self-hosted — without manual configuration.

Observation fetching: batch vs concurrent

Analytics tools that need per-generation data (token percentiles, context breaches, latency breakdown) face the N+1 problem: one API call per trace to fetch its observations.

This server uses a two-step strategy:

Try batch fetch — fetch ALL observations for the time range in one paginated call, group by traceId in memory
If volume is too high (>5000 pages / 500K+ observations) — fall back to concurrent per-trace fetch using asyncio.gather with semaphore-controlled concurrency

This means the server handles both small projects (batch is faster) and large-scale deployments (concurrent targeted fetching avoids downloading millions of irrelevant observations).

Context isolation via sub-agent

The server ships with a Claude Code custom agent at .claude/agents/langfuse-analyst.md. When a user asks a Langfuse-related question, Claude Code can delegate to this agent, which:

Only loads Langfuse MCP tools (not other tools in the session)
Has a specialized system prompt with tool taxonomy and workflow patterns
Runs in an isolated context window, keeping the main conversation clean
Returns a summary to the parent conversation

This prevents 33 tool schemas (~5000 tokens) from polluting every conversation.

Contributing

See CONTRIBUTING.md for development setup, code style guidelines, and areas for contribution.

Security

See SECURITY.md for the security policy, vulnerability reporting, and API key handling.

Code of Conduct

See CODE_OF_CONDUCT.md.

License

MIT — see LICENSE.

from github.com/DrishtantKaushal/LangfuseMCP

Install Langfuse Server in Claude Desktop, Claude Code & Cursor

Recommended · one command, every IDE

unyly install langfuse-mcp-server

Installs into Claude Desktop, Claude Code, Cursor & VS Code — handles npx, uvx and build-from-source repos for you.

First time? Get the CLI: curl -fsSL https://unyly.org/install | sh

Or configure manually

Run in your terminal:

claude mcp add langfuse-mcp-server -- uvx langfuse-mcp-server

FAQ

Is Langfuse Server MCP free?

Yes, Langfuse Server MCP is free — one-click install via Unyly at no cost.

Does Langfuse Server need an API key?

No, Langfuse Server runs without API keys or environment variables.

Is Langfuse Server hosted or self-hosted?

A hosted option is available: Unyly runs the server in the cloud, no local setup required.

How do I install Langfuse Server in Claude Desktop, Claude Code or Cursor?

Open Langfuse Server on unyly.org, pick your client tab (Claude Desktop, Claude Code, Cursor) and press Install — the config is generated automatically, no JSON editing.

Related MCPs

GitHub

PRs, issues, code search, CI status

by GitHub

Filesystem

Secure file operations with configurable access controls.

by Community

Memory

Knowledge graph-based persistent memory system.