loading…
Search for a command to run...
loading…
Enables AI assistants to perform comprehensive web research through tiered search, secure URL fetching with markdown conversion, and automated multi-source synt
Enables AI assistants to perform comprehensive web research through tiered search, secure URL fetching with markdown conversion, and automated multi-source synthesis pipelines. Provides read-only tools with configurable caching, SSRF protection, and optional LLM-powered summarization for search results and content analysis.
MCP server for web research, academic papers, Twitter/X, YouTube, and file ingestion. Eight tools for AI assistants — all via the MCP stdio protocol. Includes credential vault for institutional access, CAPTCHA detection, and token-efficient output.
| Tool | Description |
|---|---|
web_search |
3-tier search cascade: Brave API → DuckDuckGo → HTML scraper |
fetch_url |
Fetch any URL → clean markdown, with SSRF protection and 24h cache |
research |
Compound pipeline: query rewrite → search → parallel fetch → summarize → synthesize |
youtube_essence |
YouTube video → transcript, summary, key points, chapters, quotes |
deep_ingest |
Extract text from files: PDF, DOCX, XLSX, PPTX, audio, video, images |
academic_lookup |
Resolve DOI / ArXiv / PubMed → metadata + full text via institutional access |
twitter_extract |
Extract tweets and threads from X.com/Twitter |
vault_status |
Show loaded credential profiles and dependency status (never exposes secrets) |
All tools are read-only — they fetch and transform content, never modify anything.
pip install mcp-research
Or run directly with uvx (zero-install):
uvx mcp-research
Optional extras:
pip install 'mcp-research[twitter]' # yt-dlp for Twitter extraction
pip install 'mcp-research[youtube]' # yt-dlp + faster-whisper for YouTube
pip install 'mcp-research[academic]' # PyPDF2 for academic PDFs
pip install 'mcp-research[ingest]' # PDF, DOCX, XLSX, PPTX, audio support
pip install 'mcp-research[all]' # everything
Check your setup:
mcp-research doctor
Add to your Claude Code MCP config (~/.claude/settings.json or project .mcp.json):
{
"mcpServers": {
"research": {
"command": "uvx",
"args": ["mcp-research"],
"env": {
"BRAVE_API_KEY": "BSA...",
"OLLAMA_URL": "http://localhost:11434"
}
}
}
}
Add to claude_desktop_config.json:
{
"mcpServers": {
"research": {
"command": "uvx",
"args": ["mcp-research"],
"env": {
"BRAVE_API_KEY": "BSA..."
}
}
}
}
All configuration is via environment variables — no config files needed (except the optional vault).
| Variable | Default | Description |
|---|---|---|
BRAVE_API_KEY |
(empty) | Brave Search API key. Falls back to DuckDuckGo if unset. |
OLLAMA_URL |
http://localhost:11434 |
Ollama endpoint for summarization/synthesis. Set empty to disable. |
OLLAMA_MODEL |
qwen2.5:14b |
Model to use for summarization and synthesis. |
MCP_RESEARCH_CACHE_DIR |
~/.mcp-research/cache/ |
URL fetch cache directory. |
MCP_RESEARCH_CACHE_TTL |
24 |
Cache TTL in hours. |
MCP_RESEARCH_LOG_DIR |
~/.mcp-research/logs/ |
Search log directory (NDJSON). |
MCP_RESEARCH_MAX_RESULTS |
10 |
Default max search results. |
MCP_RESEARCH_VAULT_FILE |
~/.mcp-research/vault.yaml |
Credential vault file path. |
MCP_RESEARCH_VAULT_HOT_RELOAD |
true |
Auto-reload vault when file changes. |
MCP_RESEARCH_SESSION_TTL |
1800 |
Session idle timeout in seconds. |
web_searchweb_search(query, max_results=5, summarize=False, auto_fetch_top=False)
Searches the web using a 3-tier cascade for maximum reliability:
BRAVE_API_KEY)Options:
summarize: Use Ollama to summarize results (requires running Ollama)auto_fetch_top: Also fetch and return the full content of the top resultfetch_urlfetch_url(url, summarize=False, max_chars=15000)
Fetches a URL and converts it to clean markdown:
max_charsresearchresearch(query, depth="standard", context="")
Compound research pipeline:
Depth levels:
| Depth | Pages | Synthesis |
|---|---|---|
quick |
2 | No |
standard |
5 | Yes |
deep |
10 | Yes |
All steps gracefully degrade without Ollama — you still get search results and page content.
youtube_essenceyoutube_essence(url, mode="standard")
Extracts structured content from YouTube videos:
Modes: quick (TL;DR), standard (+ chapters), deep (+ quotes)
Requires yt-dlp. Optional: faster-whisper for audio-only videos, ffmpeg for media extraction.
deep_ingestdeep_ingest(path, include_types="", max_files=200, summarize=False)
Extracts text from files in a directory or single file:
.txt, .md, .json, .csv, source code, etc..docx, .xlsx, .pptx (optional dependencies)Type filter: text, pdf, audio, video, image, office
academic_lookupacademic_lookup(identifier, fetch_fulltext=True)
Resolves academic papers from multiple identifier types:
10.xxxx/... → Crossref metadata + publisher redirect2301.12345 → abstract + PDFFull text access via credential vault:
twitter_extracttwitter_extract(url, include_thread=False)
Extracts tweets and threads from X.com/Twitter using a strategy cascade:
Returns: text, author, timestamp, metrics (likes, retweets, replies), media URLs.
vault_statusvault_status()
Shows loaded credential profiles, match patterns, and auth types — never exposes secrets. Also checks availability of optional dependencies.
Create ~/.mcp-research/vault.yaml to configure authentication for protected sources:
version: 1
profiles:
# University EZproxy for IEEE
ieee-university:
match: "*.ieee.org/**"
ezproxy:
base_url: "https://ezproxy.myuniversity.edu/login?url="
mode: prefix
# Springer via API key
springer:
match: "*.springer.com/**"
auth:
type: api_key
header: "X-ApiKey"
value: "${SPRINGER_API_KEY}"
# X.com via browser cookies
twitter:
match: "*.x.com/**"
auth:
type: cookie_jar
path: "${HOME}/.mcp-research/cookies/twitter.txt"
${VAR} resolved from environment variables — secrets never stored in plain textbearer, basic, api_key, cookie_jar, headersprefix (prepend base URL) or suffix (domain rewriting)All tools produce compact output by default to avoid wasting AI context window tokens:
| Tool | Default output | Override |
|---|---|---|
fetch_url |
~15K chars (~4K tokens) | max_chars parameter |
research |
~500 tokens per source | Prefers summaries over raw content |
academic_lookup |
~10K chars full text | Truncates with notice |
deep_ingest |
15 files, 300 char excerpts | max_files parameter |
youtube_essence |
3K char transcript excerpt | Full transcript in result object |
repr() redacts all auth valuesmcp-research serve # Run MCP stdio server (default)
mcp-research search "query" # Search the web
mcp-research fetch https://example.com # Fetch URL to markdown
mcp-research youtube https://youtu.be/... # Extract YouTube video
mcp-research ingest ./docs/ # Extract text from files
mcp-research academic "10.1109/..." # Resolve academic paper
mcp-research tweet https://x.com/.../123 # Extract tweet
mcp-research vault # Show vault profiles
mcp-research doctor # Check dependencies
git clone https://github.com/MABAAM/Maibaamcrawler.git
cd Maibaamcrawler
pip install -e ".[all]"
pytest tests/ -v
python -m mcp_research
~/.mcp-research/vault.yaml with env var interpolation, glob URL matching, EZproxy rewriting, hot-reloadmcp-research doctor checks all dependencies and configurationMIT
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"mcp-research": {
"command": "npx",
"args": []
}
}
}