loading…
Search for a command to run...
loading…
Enables web crawling and content extraction from web pages, supporting multiple output formats like text, markdown, XML, and JSON, with robots.txt compliance an
Enables web crawling and content extraction from web pages, supporting multiple output formats like text, markdown, XML, and JSON, with robots.txt compliance and rate limiting.
license npm version npm downloads GitHub stars
A Model Context Protocol (MCP) server for web crawling and content extraction from web pages with multiple output formats.
Add this server to your MCP client configuration:
{
"mcpServers": {
"open-crawler": {
"command": "npx",
"args": ["@elchika-inc/open-crawler-mcp-server"]
}
}
}
Extracts content from a web page in multiple formats with automatic robots.txt compliance checking.
Parameters:
url (required): Target URL to crawlselector (optional): CSS selector for specific content extractionformat (optional): Output format - text, markdown, xml, or json (default: text)text_only (optional): Legacy parameter for text-only extraction (deprecated, use format instead)Output Formats:
text: Clean, plain text content with whitespace normalizedmarkdown: Well-formatted Markdown with headings, links, images, and lists preservedxml: Structured XML with separate sections for headings, paragraphs, links, images, and listsjson: Structured JSON object containing categorized content elementsExamples:
Basic text extraction:
{
"name": "crawl_page",
"arguments": {
"url": "https://example.com",
"format": "text"
}
}
Markdown extraction with CSS selector:
{
"name": "crawl_page",
"arguments": {
"url": "https://example.com",
"selector": "article",
"format": "markdown"
}
}
Structured JSON extraction:
{
"name": "crawl_page",
"arguments": {
"url": "https://example.com",
"format": "json"
}
}
Validates if a URL is allowed to be crawled according to the site's robots.txt file.
Parameters:
url (required): URL to check for crawling permissionExample:
{
"name": "check_robots",
"arguments": {
"url": "https://example.com/page"
}
}
Common error scenarios:
MIT
Выполни в терминале:
claude mcp add open-crawler-mcp-server -- npx Не уверен что выбрать?
Найди свой стек за 60 секунд
Автор?
Embed-бейдж для README
Похожее
Все в категории development