loading…
Search for a command to run...
loading…
Enables natural language search and analysis of arXiv academic papers with AI-powered relevance ranking, full-text extraction, and support for multiple integrat
Enables natural language search and analysis of arXiv academic papers with AI-powered relevance ranking, full-text extraction, and support for multiple integrations like Claude and LangChain.
A comprehensive Model Context Protocol (MCP) server for searching and analyzing academic papers from arXiv with AI-powered relevance ranking and full-text extraction.
# Clone the repository
git clone https://github.com/borderlessboy/arxiv-research-mcp
cd arxiv-research-mcp
# Install dependencies
pip install -r requirements.txt
# Create environment configuration
# cp .env.example .env # Create .env file with your configuration
# Run the MCP server
python scripts/run_server.py
# Or use the Streamlit dashboard
streamlit run integrations/streamlit_app.py
The project includes a Dockerfile for easy containerized deployment.
# Build the Docker image
docker build -t arxiv-research-mcp .
# Run the container
docker run -p 8090:8090 arxiv-research-mcp
# Build with custom tag
docker build -t arxiv-research-mcp:latest .
# Run with custom port mapping
docker run -p 8080:8090 arxiv-research-mcp
# Run with volume for persistent cache
docker run -p 8090:8090 -v $(pwd)/cache:/app/cache arxiv-research-mcp
# Run with environment variables
docker run -p 8090:8090 \
-e CACHE_ENABLED=true \
-e CACHE_TTL_HOURS=24 \
-e LOG_LEVEL=INFO \
arxiv-research-mcp
The project includes a docker-compose.yml file for easy deployment:
# Start the service
docker-compose up -d
# View logs
docker-compose logs -f
# Stop the service
docker-compose down
Or create a custom docker-compose.yml:
services:
arxiv-research-mcp:
build: .
ports:
- "8090:8090"
volumes:
- ./cache:/app/cache
environment:
- CACHE_ENABLED=true
- CACHE_TTL_HOURS=24
- LOG_LEVEL=INFO
restart: unless-stopped
# Start the service
docker-compose up -d
# View logs
docker-compose logs -f
# Stop the service
docker-compose down
# Build for development with all dependencies
docker build -t arxiv-research-mcp:dev .
# Run with mounted source code for development
docker run -p 8090:8090 \
-v $(pwd)/src:/app/src \
-v $(pwd)/config:/app/config \
-v $(pwd)/cache:/app/cache \
arxiv-research-mcp:dev
# Quick start with Docker
docker build -t arxiv-research-mcp .
docker run -p 8090:8090 arxiv-research-mcp
pip install "arxiv-research-mcp[all]"
# API server only
pip install "arxiv-research-mcp[api]"
# Jupyter integration
pip install "arxiv-research-mcp[jupyter]"
# Dashboard
pip install "arxiv-research-mcp[dashboard]"
# LangChain integration
pip install "arxiv-research-mcp[langchain]"
from src.server import search_arxiv_papers_tool
# Search for papers
result = await search_arxiv_papers_tool({
"query": "transformer models",
"max_results": 10,
"years_back": 4,
"include_full_text": True
})
from integrations.langchain_tool import ResearchAgent
agent = ResearchAgent()
result = agent.research_topic("quantum machine learning")
from integrations.jupyter_helper import search_papers
# Search and analyze
helper = await search_papers("machine learning", max_results=20)
# Create visualizations
fig = helper.create_publication_timeline()
plt.show()
streamlit run integrations/streamlit_app.py
Create a .env file with your settings:
# Server Configuration
SERVER_NAME=arxiv-research-server
LOG_LEVEL=INFO
# arXiv API Configuration
ARXIV_REQUEST_TIMEOUT=30
ARXIV_MAX_RETRIES=3
# Caching
CACHE_ENABLED=true
CACHE_TTL_HOURS=24
# Content Processing
MAX_FULL_TEXT_LENGTH=50000
DEFAULT_MAX_RESULTS=10
DEFAULT_YEARS_BACK=4
search_arxiv_papersSearch for academic papers with relevance ranking.
Parameters:
query (string): Search querymax_results (integer, default: 10): Maximum papers to returnyears_back (integer, default: 4): Years to search backinclude_full_text (boolean, default: true): Include full paper textclear_cacheClear all cached search results.
get_cache_statsGet cache statistics and information.
ArxivResearchToolSearch arXiv papers with LangChain integration.
ArxivCacheManagementToolManage cache with LangChain integration.
The server uses TF-IDF vectorization and cosine similarity to rank papers by relevance to your query.
Multiple extraction methods (PyPDF2, pdfplumber) ensure robust text extraction from PDFs.
Intelligent caching reduces API calls and improves response times.
Process multiple research topics efficiently with the batch processor.
The project includes a production-ready Dockerfile with:
pytest tests/
black src/ tests/
flake8 src/ tests/
mypy src/
python setup.py build
# Build development image
docker build -t arxiv-research-mcp:dev .
# Run with source code mounted for development
docker run -p 8090:8090 \
-v $(pwd)/src:/app/src \
-v $(pwd)/config:/app/config \
-v $(pwd)/cache:/app/cache \
arxiv-research-mcp:dev
# Run tests in Docker
docker run arxiv-research-mcp:dev pytest tests/
arxiv-research-mcp/
├── src/
│ ├── server.py # Main MCP server
│ ├── models/ # Data models
│ ├── services/ # Core services
│ └── utils/ # Utility functions
├── integrations/ # External integrations
├── scripts/ # Utility scripts
├── tests/ # Test suite
└── examples/ # Usage examples
For detailed documentation and guides, see the Docs/ directory:
This project is licensed under the MIT License - see the LICENSE file for details.
Port already in use:
# Use a different port
docker run -p 8080:8090 arxiv-research-mcp
Permission denied:
# Run with proper permissions
sudo docker run -p 8090:8090 arxiv-research-mcp
Build fails:
# Clean build
docker system prune -a
docker build --no-cache -t arxiv-research-mcp .
Container exits immediately:
# Check logs
docker logs <container_id>
# Run interactively
docker run -it arxiv-research-mcp /bin/bash
Run in your terminal:
claude mcp add arxiv-research-mcp-server -- npx Security
Low riskAutomated heuristic from public metadata — not a security guarantee.