loading…
Search for a command to run...
loading…
An MCP server for intelligent web content extraction from JavaScript-heavy sites using single-file and trafilatura. It enables AI agents to fetch, render, and p
An MCP server for intelligent web content extraction from JavaScript-heavy sites using single-file and trafilatura. It enables AI agents to fetch, render, and paginate through clean article content and metadata.
A powerful Model Context Protocol (MCP) server that provides intelligent web content extraction using single-file and trafilatura. Perfect for AI agents that need to access and analyze web content from JavaScript-heavy sites.
GitHub Repository: https://github.com/kwinsch/singlefile-mcp
The single-file CLI is essential for this MCP server to work. It uses a real browser engine to accurately capture JavaScript-rendered content.
npm install -g single-file-cli
claude mcp add singlefile-mcp -s user -- uvx singlefile-mcp
This will automatically install and run the package from PyPI, similar to how Brave Search works!
claude mcp add singlefile-mcp -s user -- uvx --from /path/to/single-file_mcp singlefile-mcp
claude mcp remove single-file-fetcher --scope user
claude mcp add brave-search -s user -- env BRAVE_API_KEY=YOUR_KEY npx -y @modelcontextprotocol/server-brave-search
Fetch and process web content with intelligent extraction.
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
string | required | URL of the webpage to fetch |
output_content |
boolean | true |
Whether to return content in response |
extract_content |
boolean | false |
Extract clean text content (recommended) |
include_metadata |
boolean | true |
Include page metadata (title, author, etc.) |
block_images |
boolean | false |
Block image downloads for faster processing |
block_scripts |
boolean | true |
Block JavaScript execution |
compress_html |
boolean | true |
Compress HTML output |
max_tokens |
number | 20000 |
Maximum tokens in response (max: 25000) |
truncate_method |
string | "truncate" |
How to handle large content: "truncate" or "summary" |
offset |
number | 0 |
Character offset to start reading from |
limit |
number | null |
Maximum characters to return |
Basic content extraction:
fetch_webpage(
url="https://example.com/article",
extract_content=True,
include_metadata=True
)
Paginated reading of large documents:
# Get overview
fetch_webpage(
url="https://docs.example.com/guide",
extract_content=True,
limit=5000
)
# Continue reading from offset
fetch_webpage(
url="https://docs.example.com/guide",
extract_content=True,
offset=5000,
limit=5000
)
Raw HTML for complex parsing:
fetch_webpage(
url="https://app.example.com/dashboard",
extract_content=False,
block_scripts=False,
max_tokens=15000
)
Here's a real-world example combining Brave Search and Single-File MCP:
Step 1: Search for information
# Using Brave Search MCP
brave_web_search(
query="artificial intelligence history timeline",
count=5
)
Step 2: Fetch and analyze Wikipedia article
# Using Single-File MCP to extract content
fetch_webpage(
url="https://en.wikipedia.org/wiki/History_of_artificial_intelligence",
extract_content=True,
include_metadata=True,
limit=5000 # Get first 5000 chars
)
Result:
Successfully fetched webpage: https://en.wikipedia.org/wiki/History_of_artificial_intelligence
## Metadata
**Title:** History of artificial intelligence - Wikipedia
**Description:** The history of artificial intelligence (AI) began in antiquity...
**Site:** wikipedia.org
## Extracted Content (chars 0-5000 of 45000)
*Note: More content available. Use offset=5000 to continue.*
# History of artificial intelligence
The history of artificial intelligence (AI) began in antiquity, with myths,
stories and rumors of artificial beings endowed with intelligence...
[Clean, readable article content follows...]
Step 3: Continue reading with pagination
# Get next section
fetch_webpage(
url="https://en.wikipedia.org/wiki/History_of_artificial_intelligence",
extract_content=True,
offset=5000,
limit=5000
)
This workflow enables AI agents to:
Successfully fetched webpage: https://example.com
## Metadata
**Title:** Example Article
**Author:** John Doe
**Date:** 2024-01-15
**Description:** An informative article about...
**Site:** example.com
## Extracted Content (chars 0-5000 of 12000)
*Note: More content available. Use offset=5000 to continue.*
# Article Title
This is the clean, readable content extracted from the webpage...
When using offset/limit, responses include:
chars 1000-6000 of 12000Use offset=6000 to continuePerfect for reading large technical docs, API references, and guides that span multiple pages.
Extract clean article content from news sites, blogs, and publications for analysis.
Gather structured data from websites, including metadata and clean text content.
Enable AI agents to browse and understand web content, even from JavaScript-heavy applications.
Handle complex legal documents and government sites that require JavaScript rendering.
Uses a browser via single-file for full JavaScript support:
Automatically extracts:
git clone https://github.com/kwinsch/singlefile-mcp.git
cd singlefile-mcp
pip install -r requirements.txt
pip install -e .
claude mcp add singlefile-mcp -s user -- uvx --from . singlefile-mcp
MIT License - see LICENSE file for details.
Run in your terminal:
claude mcp add singlefile-mcp -- npx Security
Low riskAutomated heuristic from public metadata — not a security guarantee.