loading…
Search for a command to run...
loading…
Converts documents, webpages, and media files into markdown for AI assistants using Microsoft's MarkItDown and Crawl4AI. It enables tools to read PDFs, Office f
Converts documents, webpages, and media files into markdown for AI assistants using Microsoft's MarkItDown and Crawl4AI. It enables tools to read PDFs, Office files, and JavaScript-rendered websites with support for OCR and image extraction.
Convert any document, webpage, or media file to markdown. Works as an HTTP API or directly with AI tools via MCP.
CI Coverage Status PyPI version Python 3.10+ License: MIT Docker
md-server converts files, URLs, or raw content into markdown. It automatically detects input types, handles everything from PDFs and Office documents, YouTube videos, images, to web pages with JavaScript rendering, and requires zero configuration to get started.
Two ways to use it:
Under the hood, it uses Microsoft's MarkItDown for document conversion and Crawl4AI for intelligent web scraping.
Prerequisites:
uvx playwright install --with-deps chromium# Starts server at localhost:8080
uvx md-server
# Convert a file
curl -X POST localhost:8080/convert --data-binary @document.pdf
# Convert a URL
curl -X POST localhost:8080/convert \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
# Convert HTML text
curl -X POST localhost:8080/convert \
-H "Content-Type: application/json" \
-d '{"text": "<h1>Title</h1><p>Content</p>", "mime_type": "text/html"}'
md-server runs as a local MCP server, giving AI assistants like Claude Desktop, Cursor, Copilot, and OpenCode the ability to read documents and web pages directly.
Prerequisites:
uvx playwright install --with-deps chromiumAdd to your MCP configuration:
{
"mcpServers": {
"md-server": {
"command": "uvx",
"args": ["md-server[mcp]", "--mcp-stdio"]
}
}
}
The first run downloads dependencies and may take a minute.
Once configured, your AI gets the convert_to_markdown tool:
See MCP Guide for all options and troubleshooting.
For MCP server setup (AI tools), see MCP Server above.
uvx md-server
The Docker image includes browser support for JavaScript rendering.
docker run -p 127.0.0.1:8080:8080 ghcr.io/peteretelej/md-server
POST /convertSingle endpoint that accepts multiple input types and automatically detects what you're sending.
# Binary file upload
curl -X POST localhost:8080/convert --data-binary @document.pdf
# Multipart form upload
curl -X POST localhost:8080/convert -F "[email protected]"
# URL conversion
curl -X POST localhost:8080/convert \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
# Base64 content
curl -X POST localhost:8080/convert \
-H "Content-Type: application/json" \
-d '{"content": "base64_encoded_file_here", "filename": "report.docx"}'
# Raw text
curl -X POST localhost:8080/convert \
-H "Content-Type: application/json" \
-d '{"text": "# Already Markdown\n\nBut might need cleaning"}'
# Text with specific format (HTML, XML, etc.)
curl -X POST localhost:8080/convert \
-H "Content-Type: application/json" \
-d '{"text": "<h1>HTML Title</h1><p>Convert HTML to markdown</p>", "mime_type": "text/html"}'
{
"success": true,
"markdown": "# Converted Content\n\nYour markdown here...",
"metadata": {
"source_type": "pdf",
"source_size": 102400,
"markdown_size": 8192,
"conversion_time_ms": 245,
"detected_format": "application/pdf"
},
"request_id": "req_550e8400-e29b-41d4-a716-446655440000"
}
{
"url": "https://example.com",
"options": {
"js_rendering": true, // Use headless browser for JavaScript sites
"extract_images": true, // Extract and link images
"ocr_enabled": true, // OCR for scanned PDFs/images
"preserve_formatting": true // Keep complex formatting
}
}
GET /formatsReturns supported formats and capabilities.
curl localhost:8080/formats
GET /healthHealth check endpoint.
curl localhost:8080/health
Documents: PDF, DOCX, XLSX, PPTX, ODT, ODS, ODP Web: HTML, URLs (with JavaScript rendering) Images: PNG, JPG, JPEG (with OCR) Audio: MP3, WAV (transcription) — requires ffmpeg Video: YouTube URLs Text: TXT, MD, CSV, XML, JSON
Docker includes browser support out of the box.
Local installations use MarkItDown for URL conversion by default. To read pages that require JavaScript (SPAs, dashboards, interactive apps):
uvx playwright install --with-deps chromium
When a browser is available, md-server automatically uses Crawl4AI for these pages.
# Convert HTML from stdin
echo "<h1>Hello</h1>" | curl -X POST localhost:8080/convert \
--data-binary @- \
-H "Content-Type: text/html"
# Chain with other tools
pdftotext document.pdf - | curl -X POST localhost:8080/convert \
--data-binary @-
pip install md-server[sdk]
from md_server.sdk import MDConverter
converter = MDConverter(ocr_enabled=True, js_rendering=True)
# Async
result = await converter.convert_file('document.pdf')
result = await converter.convert_url('https://example.com')
print(result.markdown)
# Sync
result = converter.convert_file_sync('document.pdf')
For remote API usage and advanced patterns, see the Python SDK documentation.
Errors include actionable information:
{
"success": false,
"error": {
"code": "UNSUPPORTED_FORMAT",
"message": "File format not supported",
"details": {
"detected_format": "application/x-rar",
"supported_formats": ["pdf", "docx", "html", "..."]
}
},
"request_id": "req_550e8400-e29b-41d4-a716-446655440000"
}
Full documentation is available in the docs directory:
See CONTRIBUTING.md for development setup, testing, and contribution guidelines.
This project makes use of these excellent tools:
Powered by Crawl4AI microsoft/markitdown Litestar Project
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"md-server": {
"command": "npx",
"args": []
}
}
}Transcripts, channel stats, search
AI image generation using various models.
Unified GPU inference API with 30 AI services (LLM, image gen, video, TTS, whisper, embeddings, reranking, OCR) as MCP tools. Pay-per-use via x402 USDC or API k
A powerful image generation tool using Google's Imagen 3.0 API through MCP. Generate high-quality images from text prompts with advanced photography, artistic,