loading…
Search for a command to run...
loading…
An MCP server that enables the analysis of CSV and Parquet files by providing tools for statistical summaries, data previews, and structure exploration. It allo
An MCP server that enables the analysis of CSV and Parquet files by providing tools for statistical summaries, data previews, and structure exploration. It allows users to query local datasets and create sample data using natural language.
This guide will walk you through setting up a Model Context Protocol (MCP) server that can analyze CSV and Parquet files, and connecting it to Claude Desktop for natural language data analysis.
A powerful data analysis tool that allows Claude to:
Model Context Protocol (MCP) is a standardized way to connect AI assistants like Claude to external tools and data sources. It allows you to:
# Clone or create project directory
mkdir mcp-file-analyzer && cd mcp-file-analyzer
# Set up virtual environment
python3 -m venv .venv && source .venv/bin/activate
# Install dependencies
pip install mcp>=1.0.0 pandas>=2.0.0 pyarrow>=10.0.0
# Create and test the server (copy main.py and client.py from this repo)
python main.py # Start server (Ctrl+C to stop)
python client.py # Test the connection
# Configure Claude Desktop (see detailed steps below)
Before you begin, make sure you have:
Check your Python version:
python3 --version # Should be 3.8+
# Create project directory
mkdir mcp-file-analyzer
cd mcp-file-analyzer
# Create virtual environment
python3 -m venv .venv
# Activate virtual environment
# On macOS/Linux:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate
Create requirements.txt:
# Core dependencies for MCP File Analyzer
mcp>=1.0.0
pandas>=2.0.0
pyarrow>=10.0.0
# HTTP client dependencies (optional)
httpx>=0.27.0
# Development dependencies (optional)
# pytest>=7.0.0
# black>=23.0.0
# flake8>=6.0.0
Install dependencies:
pip install -r requirements.txt
Your project needs these core files:
Create activate_env.sh for easy environment activation:
#!/bin/bash
echo "🚀 Activating virtual environment..."
source .venv/bin/activate
echo "✅ Virtual environment activated!"
echo "📦 Installed packages:"
pip list --format=columns
echo ""
echo "🎯 Quick start commands:"
echo " - Run MCP server: python main.py"
echo " - Run demo client: python client.py"
echo " - Interactive client: python client.py interactive"
Make it executable:
chmod +x activate_env.sh
Create run_mcp_server.sh:
#!/bin/bash
# MCP Server Launcher for Claude Desktop
# Get the directory where this script is located
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
# Change to the script directory
cd "$SCRIPT_DIR"
# Activate the virtual environment
source .venv/bin/activate
# Run the MCP server
python main.py
Make it executable:
chmod +x run_mcp_server.sh
Create claude_desktop_config.json:
{
"mcpServers": {
"file_analyzer": {
"command": "/ABSOLUTE/PATH/TO/YOUR/PROJECT/run_mcp_server.sh",
"args": []
}
}
}
Important: Replace /ABSOLUTE/PATH/TO/YOUR/PROJECT with your actual project path. Get it with:
pwd # Copy this output
Copy the configuration to Claude Desktop:
macOS:
cp claude_desktop_config.json ~/Library/Application\ Support/Claude/claude_desktop_config.json
Windows:
copy claude_desktop_config.json %APPDATA%\Claude\claude_desktop_config.json
Linux:
cp claude_desktop_config.json ~/.config/claude/claude_desktop_config.json
For web-based testing and debugging, you can also run an HTTP version:
# Install additional dependencies
pip install uvicorn fastapi
# Start HTTP server
python http_server.py
# Test with HTTP client
python http_client.py
# Access web interface
open http://localhost:8000/docs
Once integrated, try these commands in Claude Desktop:
Check available tools:
What MCP tools do you have available?
List data files:
What data files do I have available?
Analyze a CSV file:
Can you summarize the sample.csv file?
Data exploration:
Show me the first 5 rows of sample.csv and tell me about the data structure
Statistical analysis:
Give me statistical information about sample.csv - what are the data types and any interesting patterns?
Create new data:
Create a new CSV file called "customer_data.csv" with 50 rows of sample customer data
Comprehensive analysis:
List all my data files, pick the most interesting one, and give me a complete analysis including:
- File structure and dimensions
- Data types for each column
- First few rows as examples
- Statistical summary for numeric columns
Claude should respond with actual data from your files:
Your MCP server automatically creates sample data:
sample.csv:
id,name,email,signup_date
1,Alice Johnson,[email protected],2023-01-15
2,Bob Smith,[email protected],2023-02-22
3,Carol Lee,[email protected],2023-03-10
4,David Wu,[email protected],2023-04-18
5,Eva Brown,[email protected],2023-05-30
# Activate environment
source .venv/bin/activate
# Test server and client
python client.py
Expected output:
🚀 Starting MCP File Analyzer Client Demo
==================================================
✅ Connected to MCP server successfully!
🔧 Available tools:
- list_data_files
- summarize_csv_file
- summarize_parquet_file
- analyze_csv_data
- create_sample_data
📂 Listing data files:
📄 Result: Available data files: sample.csv, sample.parquet
📊 Summarizing CSV file:
📄 Result: CSV file 'sample.csv' has 5 rows and 4 columns...
python client.py interactive
Try these commands:
list_filessummarize sample.csvanalyze sample.csv headcreate test_data.csv 10In Claude Desktop, you should see:
Symptoms: Claude Desktop starts but no MCP tools appear
Solutions:
# Check config file location
ls -la ~/Library/Application\ Support/Claude/claude_desktop_config.json
# Verify JSON syntax
cat ~/Library/Application\ Support/Claude/claude_desktop_config.json
# Test launcher script
./run_mcp_server.sh
# Check permissions
chmod +x run_mcp_server.sh
Symptoms: Claude shows error about server connection
Solutions:
# Verify absolute path in config
pwd # Make sure this matches your config
# Test server independently
source .venv/bin/activate
python main.py
# Check virtual environment
which python # Should show .venv path
Symptoms: Import errors when starting server
Solutions:
# Reinstall dependencies
source .venv/bin/activate
pip install -r requirements.txt
# Verify installation
pip list | grep mcp
pip list | grep pandas
pip list | grep pyarrow
Symptoms: Tools listed but return errors
Solutions:
# Check data directory
ls -la data/
# Recreate sample data
rm -rf data/
python main.py # Will recreate sample files
# Test with client
python client.py
Test each component independently:
# Test server
python main.py
# Test client (in another terminal)
python client.py
# Test launcher
./run_mcp_server.sh
Check file permissions:
ls -la *.py *.sh
chmod +x run_mcp_server.sh
Validate configuration:
# Check JSON syntax
python -c "import json; print(json.load(open('claude_desktop_config.json')))"
Check Claude Desktop logs:
Create custom tools with the @mcp.tool() decorator:
@mcp.tool()
def analyze_excel_file(filename: str) -> str:
"""
Analyze an Excel file and return summary information.
Args:
filename: Name of the Excel file (e.g., 'data.xlsx')
Returns:
A string describing the file's contents.
"""
import pandas as pd
file_path = DATA_DIR / filename
# Read Excel file
df = pd.read_excel(file_path)
return f"Excel file '{filename}' has {len(df)} rows and {len(df.columns)} columns"
Provide static information to Claude:
@mcp.resource("data://file-formats")
def get_supported_formats() -> str:
"""List supported file formats."""
formats = {
"supported_formats": ["CSV", "Parquet", "Excel", "JSON"],
"max_file_size": "100MB",
"encoding": "UTF-8"
}
return json.dumps(formats, indent=2)
Connect to databases:
import sqlite3
@mcp.tool()
def query_database(query: str) -> str:
"""
Execute a SQL query on the local database.
Args:
query: SQL query to execute
Returns:
Query results as formatted text.
"""
conn = sqlite3.connect('data/database.db')
df = pd.read_sql_query(query, conn)
conn.close()
return df.to_string()
Your complete project should look like this:
mcp-file-analyzer/
├── .venv/ # Virtual environment
├── data/ # Data files (auto-created)
│ ├── sample.csv # Sample CSV data
│ ├── sample.parquet # Sample Parquet data
│ └── ... # Your data files
├── main.py # MCP server (stdio)
├── client.py # Test client (stdio)
├── http_server.py # HTTP MCP server (optional)
├── http_client.py # HTTP test client (optional)
├── requirements.txt # Python dependencies
├── activate_env.sh # Environment activation script
├── run_mcp_server.sh # Claude Desktop launcher
├── claude_desktop_config.json # Claude Desktop config
├── .gitignore # Git ignore file
└── README.md # This file
data/ directoryTo use this MCP server from Claude Desktop (macOS):
Make the launcher script executable:
chmod +x /Users/gaohan/Downloads/file_analyzer-main/run_mcp_server.sh
File listing
Prompt:
“Please list all available data files from the file_analyzer MCP server.”
Behavior:
Claude called the list_data_files tool and returned the same set of files
as the Python client:
client_generated.csv, generated_test.csv, sample.csv, sample.parquet.
CSV summarization
Prompt:
“Summarize the structure of sample.csv (row count, column count, column
names, and data types).”
Behavior:
Claude invoked summarize_csv_file with {"filename": "sample.csv"} and
replied that the file has 5 rows and 4 columns
(id, name, email, signup_date) with dtypes matching the pandas
output: id → int64, the others → object.
This matches exactly what I see when running python client.py.
Data analysis (describe / head / info)
Prompts (asked in separate turns):
“Run a describe analysis on sample.csv.”
“Show me the first 5 rows of sample.csv.”
“Give me the pandas info summary for sample.csv.”
Behavior:
Claude mapped these to analyze_csv_data with operation="describe",
"head", and "info" respectively.
The numeric summary for id (count 5, mean 3, std ≈ 1.58, min 1, max 5)
and the printed head/info are the same as the outputs from the interactive
Python client.
Data creation
Prompt:
“Create a new sample CSV called new_sample.csv with 5 rows of data.”
Behavior:
Claude used create_sample_data with {"filename": "new_sample.csv", "rows": 5} and confirmed that the file was created under the data/
directory. The path and row count match what I see on disk and in the
command-line client.
Error handling / edge cases
Prompts:
“Summarize a CSV file named missing.csv.”
“Analyze sample.csv without specifying the operation.”
Behavior:
For the missing file, Claude surfaced the server’s error message:
Error: CSV file 'missing.csv' does not exist in data directory.
For the incomplete analyze command, it reported the usage hint
(analyze <filename> <operation>) and listed the supported operations
(describe, head, info, columns).
This matches the edge-case behavior tested in test.py.
For simple operations on the small sample dataset (5 rows, 4 columns), the
direct Python client (python client.py / python test.py) returns almost
instantly – typically within a fraction of a second for:
list_data_filessummarize_csv_file("sample.csv")analyze_csv_data("sample.csv", "head" | "info" | "describe")create_sample_data(..., rows=5)When the same operations are triggered via Claude Desktop, there is a small but noticeable overhead. Claude has to:
In practice this adds a few hundred milliseconds to around a second, depending on the complexity of the prompt and Claude’s own model latency. Because all file I/O and pandas work happen locally, the extra delay is dominated by the LLM and message-passing overhead, not by the MCP server itself.
Programmatic calls (via client.py or the automated tests in test.py) are:
filename, operation), Usage: analyze <filename> <operation> when the user forgets an argument, Natural-language use through Claude Desktop is more flexible but also a bit less predictable:
sample.csv”list_data_files, summarize_csv_file, analyze_csv_data, etc.), and the
outputs line up with what the Python client shows.From a user-experience perspective:
In a real-world workflow, a typical pattern would be:
A few limitations of the current system and ideas for improvement:
sample.csv?”) could make
the Claude interaction smoother.You now have a fully functional MCP server that can:
✅ Analyze CSV and Parquet files
✅ Respond to natural language queries through Claude
✅ Create and manipulate data files
✅ Provide detailed statistical analysis
✅ Work entirely offline (no API keys required!)
Happy data analyzing! 📊🤖
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"mcp-file-analyzer": {
"command": "npx",
"args": []
}
}
}