loading…
Search for a command to run...
loading…
Transforms natural language queries into SQL for customer segmentation, enabling democratized access to customer data insights.
Transforms natural language queries into SQL for customer segmentation, enabling democratized access to customer data insights.
SegmentMCP is an intelligent Model Context Protocol (MCP) server that transforms natural language queries into actionable customer segments. It bridges the gap between business stakeholders who think in plain English and technical systems that require structured SQL queries, enabling democratized access to customer data insights.
Modern businesses struggle with customer segmentation due to several key barriers:
SegmentMCP eliminates these barriers by providing:
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ MCP Client │───▶│ MCP Server │───▶│ Kaggle Dataset │
│ (Claude/Custom) │ │ (FastMCP) │ │ (45K+ records) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
▼
┌──────────────────┐
│ Agent Pipeline │
│ │
│ 1. Intent Parser │ ← GPT-4.1
│ 2. Data Mapper │ ← Rule-based
│ 3. Query Gen │ ← GPT-4.1
│ 4. Validator │ ← Rule-based
│ 5. Activator │ ← Simulation
└──────────────────┘
SegmentMCP/
├── agents/ # AI processing agents
│ ├── intent_parser.py # Natural language → criteria
│ ├── data_mapper.py # Business terms → DB fields
│ ├── query_generator.py # Criteria → SQL
│ ├── validation_agent.py # SQL validation & testing
│ └── activation_agent.py # Segment execution
├── database/
│ └── kaggle_connector.py # Dataset management
├── models/
│ └── schemas.py # Pydantic data models
├── data/ # Dataset storage
│ ├── bank-full.csv # Bank customer dataset
│ └── bank_deposit.db # SQLite database
├── main.py # Core MCP server
├── demo_server.py # Demo mode (no OpenAI)
├── http_server.py # Direct HTTP API
├── http_wrapper.py # MCP protocol wrapper
├── demo_http_wrapper.py # Demo HTTP wrapper
├── config.py # Configuration management
├── generate_claude_config.py # Claude Desktop setup
├── validate_schemas.py # Schema validation
├── validate_config_usage.py # Config usage checker
├── test_config_integration.py # Config testing
└── requirements.txt # Dependencies
Note: Current version simulates downstream integrations. Real API connections require additional development.
{
"query": "High-value customers who haven't been contacted in 6 months",
"use_case": "Re-engagement campaign targeting",
"output": "Segment for email marketing platform"
}
{
"query": "Customers with loans but negative balance trends",
"use_case": "Credit risk monitoring",
"output": "Alert list for risk management team"
}
{
"query": "Young professionals without housing loans",
"use_case": "Mortgage product targeting",
"output": "Prospect list for sales team"
}
{
"query": "Long-term customers with declining engagement",
"use_case": "Churn prevention",
"output": "Priority list for customer success managers"
}
{
"query": "All customers contacted more than regulatory limit",
"use_case": "Compliance monitoring",
"output": "Audit report for regulatory team"
}
The system provides a foundation for integrating with downstream systems:
Note: Current version provides the framework and simulated responses. Real API integrations require additional development work.
Natural Language Query → Segment Creation → Campaign Launch → Performance Tracking
Lead Qualification → Segment Assignment → Automated Outreach → Conversion Tracking
Health Score Monitoring → Risk Segment Identification → Intervention Campaigns → Retention Metrics
git clone https://github.com/tejasayya/SegmentMCP.git
cd SegmentMCP
pip install -r requirements.txt
# Create .env file
OPENAI_API_KEY=your_openai_api_key_here
KAGGLE_USERNAME=your_kaggle_username
KAGGLE_KEY=your_kaggle_key
python generate_claude_config.py
This creates claude_mcp_config_generated.json with correct paths for your system.
What this does:
python validate_schemas.py # Validate data schemas
python validate_config_usage.py # Check config usage
python test_config_integration.py # Test config integration
These validate schemas, check configuration usage, and test integration.
The project includes comprehensive validation and testing infrastructure:
python validate_schemas.py
python validate_config_usage.py
python test_config_integration.py
python test_demo_direct.py # Test demo server directly
python test_http_requests.py # Test HTTP endpoints
python test_mcp_client.py # Test MCP protocol
Note on OpenAI Version: If you encounter OpenAI compatibility issues, you may need to upgrade:
pip install openai>=2.0.0 # Upgrade from 1.35.15 if needed
python main.py
python demo_server.py
For Postman/HTTP API testing, choose one approach:
python http_server.py
# Server runs on http://localhost:8001
python http_wrapper.py
# Server runs on http://localhost:8001
python demo_http_wrapper.py
# Server runs on http://localhost:8002
| Use Case | Recommended Server | Why |
|---|---|---|
| Claude Desktop Integration | main.py |
Full MCP protocol with AI |
| Development/Testing | http_server.py |
Fast HTTP testing with Postman |
| MCP Protocol Validation | http_wrapper.py |
Ensures MCP server works correctly |
| No OpenAI API Key | demo_server.py or demo_http_wrapper.py |
Works without API costs |
| Production Deployment | main.py |
Complete feature set |
curl -X POST "http://localhost:8001/create-segment" \
-H "Content-Type: application/json" \
-d '{"query": "Married customers with age over 30"}'
For AI assistant integration, run the MCP server:
python main.py
This project provides multiple ways to run the server to address different development and deployment needs:
main.py)demo_server.py)Direct Integration (http_server.py)
SegmentationMCPServer classProtocol Wrapper (http_wrapper.py)
http_server.py for fast iteration and Postman testinghttp_wrapper.py to ensure MCP server works correctlyIssue: The project initially used openai>=1.30.0,<1.36.0 but users may encounter compatibility issues.
Solution: Upgrade to openai>=2.0.0 if you face initialization errors:
pip install openai>=2.0.0
Why: Newer OpenAI versions have different client initialization patterns and better stability.
POST /create-segment
Create a customer segment from natural language description.
Request Body:
{
"query": "Description of desired customer segment in plain English"
}
Response:
{
"status": "success",
"segment_id": "SEG_ABCD1234",
"customer_count": 1500,
"downstream_systems": ["CRM_System", "Email_Marketing_Platform", "Ad_Platform"],
"generated_query": "SELECT * FROM bank_customers WHERE marital = 'married' AND age > 30 LIMIT 1000",
"validation_sample": [
{"age": 35, "job": "management", "marital": "married", "balance": 2143, "housing": "yes"},
{"age": 42, "job": "technician", "marital": "married", "balance": 1506, "housing": "no"}
],
"estimated_rows": 1500,
"processing_steps": {
"intent_parsing": {
"parsed_criteria": {
"conditions": [{"field": "marital", "operator": "=", "value": "married"}, {"field": "age", "operator": ">", "value": 30}],
"logical_operators": ["AND"]
},
"confidence": 0.9,
"ambiguous_terms": [],
"parsing_notes": ["Successfully parsed natural language query"],
"timestamp": "2024-01-15T10:30:01Z",
"processing_time_ms": 1250
},
"data_mapping": {
"business_terms": {"age": "age", "marital": "marital"},
"table_mappings": {"customers": "bank_customers"},
"field_mappings": {"marital": "marital", "age": "age"},
"timestamp": "2024-01-15T10:30:02Z",
"processing_time_ms": 150
},
"query_generation": {
"sql_query": "SELECT * FROM bank_customers WHERE marital = 'married' AND age > 30 LIMIT 1000",
"optimized": true,
"estimated_rows": 1500,
"tables_used": ["bank_customers"],
"optimization_notes": ["Added LIMIT clause for safety"],
"timestamp": "2024-01-15T10:30:03Z",
"processing_time_ms": 800
},
"validation": {
"is_valid": true,
"issues": [],
"warnings": ["Query returns large number of rows: 1500"],
"sample_data": [
{"age": 35, "job": "management", "marital": "married", "balance": 2143},
{"age": 42, "job": "technician", "marital": "married", "balance": 1506}
],
"row_count": 1500,
"timestamp": "2024-01-15T10:30:04Z",
"processing_time_ms": 200
}
}
}
GET /segment/{segment_id}
Retrieve information about a created segment.
GET /schema
Get current database schema information.
GET /health
Server health status endpoint.
All configuration values support environment variable overrides:
# Model Configuration
export OPENAI_MODEL="gpt-4.1"
export OPENAI_TEMPERATURE="0.1"
export OPENAI_MAX_TOKENS="1000"
# Agent-Specific Models
export INTENT_PARSER_MODEL="gpt-4.1"
export QUERY_GENERATOR_MODEL="gpt-4.1"
# Performance Settings
export MAX_QUERY_ROWS="1000"
export DEFAULT_QUERY_LIMIT="1000"
export VALIDATION_SAMPLE_SIZE="5"
export MAX_SAFE_ROWS="100000"
export WARNING_ROW_THRESHOLD="50000"
# Timeouts
export INTENT_PARSER_TIMEOUT="15"
export QUERY_GENERATOR_TIMEOUT="20"
export VALIDATION_TIMEOUT="10"
export ACTIVATION_TIMEOUT="25"
Each agent loads configuration automatically:
OPENAI_API_KEY: Required for AI-powered query generationOPENAI_MODEL: Model to use (default: gpt-4.1)KAGGLE_USERNAME: For dataset accessKAGGLE_KEY: Kaggle API keyDATABASE_PATH: Path to local database fileMAX_QUERY_ROWS: Maximum rows per query (default: 1000)"Customers over 25 years old"
"Married customers with housing loans"
"High balance customers without personal loans"
"Customers contacted more than 3 times but never converted"
"Young professionals with tertiary education and no defaults"
"Retired customers with high balances who were contacted in May"
"High-value prospects for mortgage products"
"At-risk customers for retention campaigns"
"Premium customers for exclusive offers"
python validate_schemas.py - Comprehensive schema validationpython validate_config_usage.py - Configuration usage analysis python test_config_integration.py - Integration testingpython generate_claude_config.py - Setup assistanceRun in your terminal:
claude mcp add segmentmcp -- npx Security
Low riskAutomated heuristic from public metadata — not a security guarantee.