loading…
Search for a command to run...
loading…
An MCP server providing tools to read PDF files.
An MCP server providing tools to read PDF files.
Production-ready PDF processing server for AI agents
npm version License CI/CD codecov coverage TypeScript Downloads
5-10x faster parallel processing • Y-coordinate content ordering • 94%+ test coverage • 103 tests passing
PDF Reader MCP is a production-ready Model Context Protocol server that empowers AI agents with enterprise-grade PDF processing capabilities. Extract text, images, and metadata with unmatched performance and reliability.
The Problem:
// Traditional PDF processing
- Sequential page processing (slow)
- No natural content ordering
- Complex path handling
- Poor error isolation
The Solution:
// PDF Reader MCP
- 5-10x faster parallel processing ⚡
- Y-coordinate based ordering 📐
- Flexible path support (absolute/relative) 🎯
- Per-page error resilience 🛡️
- 94%+ test coverage ✅
Result: Production-ready PDF processing that scales.
Real-world performance from production testing:
| Operation | Ops/sec | Performance | Use Case |
|---|---|---|---|
| Error handling | 12,933 | ⚡⚡⚡⚡⚡ | Validation & safety |
| Extract full text | 5,575 | ⚡⚡⚡⚡ | Document analysis |
| Extract page | 5,329 | ⚡⚡⚡⚡ | Single page ops |
| Multiple pages | 5,242 | ⚡⚡⚡⚡ | Batch processing |
| Metadata only | 4,912 | ⚡⚡⚡ | Quick inspection |
| Document | Sequential | Parallel | Speedup |
|---|---|---|---|
| 10-page PDF | ~2s | ~0.3s | 5-8x faster |
| 50-page PDF | ~10s | ~1s | 10x faster |
| 100+ pages | ~20s | ~2s | Linear scaling with CPU cores |
Benchmarks vary based on PDF complexity and system resources.
claude mcp add pdf-reader -- npx @sylphx/pdf-reader-mcp
Add to claude_desktop_config.json:
{
"mcpServers": {
"pdf-reader": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"]
}
}
}
~/Library/Application Support/Claude/claude_desktop_config.json%APPDATA%\Claude\claude_desktop_config.json~/.config/Claude/claude_desktop_config.jsoncode --add-mcp '{"name":"pdf-reader","command":"npx","args":["@sylphx/pdf-reader-mcp"]}'
npx @sylphx/pdf-reader-mcpAdd to your Windsurf MCP config:
{
"mcpServers": {
"pdf-reader": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"]
}
}
}
Add to Cline's MCP settings:
{
"mcpServers": {
"pdf-reader": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"]
}
}
}
npx, Args: @sylphx/pdf-reader-mcpnpx -y @smithery/cli install @sylphx/pdf-reader-mcp --client claude
# Quick start - zero installation
npx @sylphx/pdf-reader-mcp
# Or install globally
npm install -g @sylphx/pdf-reader-mcp
{
"sources": [{
"path": "documents/report.pdf"
}],
"include_full_text": true,
"include_metadata": true,
"include_page_count": true
}
Result:
{
"sources": [{
"path": "documents/manual.pdf",
"pages": "1-5,10,15-20"
}],
"include_full_text": true
}
// Windows - Both formats work!
{
"sources": [{
"path": "C:\\Users\\John\\Documents\\report.pdf"
}],
"include_full_text": true
}
// Unix/Mac
{
"sources": [{
"path": "/home/user/documents/contract.pdf"
}],
"include_full_text": true
}
No more "Absolute paths are not allowed" errors!
{
"sources": [{
"path": "presentation.pdf",
"pages": [1, 2, 3]
}],
"include_images": true,
"include_full_text": true
}
Response includes:
{
"sources": [
{ "path": "C:\\Reports\\Q1.pdf", "pages": "1-10" },
{ "path": "/home/user/Q2.pdf", "pages": "1-10" },
{ "url": "https://example.com/Q3.pdf" }
],
"include_full_text": true
}
⚡ All PDFs processed in parallel automatically!
// ✅ Windows
{ "path": "C:\\Users\\John\\Documents\\report.pdf" }
{ "path": "C:/Users/John/Documents/report.pdf" }
// ✅ Unix/Mac
{ "path": "/home/john/documents/report.pdf" }
{ "path": "/Users/john/Documents/report.pdf" }
// ✅ Relative (still works)
{ "path": "documents/report.pdf" }
Other Improvements:
v1.2.0 - Content Ordering
v1.1.0 - Image Extraction & Performance
read_pdf ToolThe single tool that handles all PDF operations.
| Parameter | Type | Description | Default |
|---|---|---|---|
sources |
Array | List of PDF sources to process | Required |
include_full_text |
boolean | Extract full text content | false |
include_metadata |
boolean | Extract PDF metadata | true |
include_page_count |
boolean | Include total page count | true |
include_images |
boolean | Extract embedded images | false |
{
path?: string; // Local file path (absolute or relative)
url?: string; // HTTP/HTTPS URL to PDF
pages?: string | number[]; // Pages to extract: "1-5,10" or [1,2,3]
}
Metadata only (fast):
{
"sources": [{ "path": "large.pdf" }],
"include_metadata": true,
"include_page_count": true,
"include_full_text": false
}
From URL:
{
"sources": [{
"url": "https://arxiv.org/pdf/2301.00001.pdf"
}],
"include_full_text": true
}
Page ranges:
{
"sources": [{
"path": "manual.pdf",
"pages": "1-5,10-15,20" // Pages 1,2,3,4,5,10,11,12,13,14,15,20
}]
}
Content is returned in natural reading order based on Y-coordinates:
Document Layout:
┌─────────────────────┐
│ [Title] Y:100 │
│ [Image] Y:150 │
│ [Text] Y:400 │
│ [Photo A] Y:500 │
│ [Photo B] Y:550 │
└─────────────────────┘
Response Order:
[
{ type: "text", text: "Title..." },
{ type: "image", data: "..." },
{ type: "text", text: "..." },
{ type: "image", data: "..." },
{ type: "image", data: "..." }
]
Benefits:
Enable extraction:
{
"sources": [{ "path": "manual.pdf" }],
"include_images": true
}
Response format:
{
"images": [{
"page": 1,
"index": 0,
"width": 1920,
"height": 1080,
"format": "rgb",
"data": "base64-encoded-png..."
}]
}
Supported formats: RGB, RGBA, Grayscale Auto-detected: JPEG, PNG, and other embedded formats
Absolute paths (v1.3.0+) - Direct file access:
{ "path": "C:\\Users\\John\\file.pdf" }
{ "path": "/home/user/file.pdf" }
Relative paths - Workspace files:
{ "path": "docs/report.pdf" }
{ "path": "./2024/Q1.pdf" }
Configure working directory:
{
"mcpServers": {
"pdf-reader-mcp": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"],
"cwd": "/path/to/documents"
}
}
}
Strategy 1: Page ranges
{ "sources": [{ "path": "big.pdf", "pages": "1-20" }] }
Strategy 2: Progressive loading
// Step 1: Get page count
{ "sources": [{ "path": "big.pdf" }], "include_full_text": false }
// Step 2: Extract sections
{ "sources": [{ "path": "big.pdf", "pages": "50-75" }] }
Strategy 3: Parallel batching
{
"sources": [
{ "path": "big.pdf", "pages": "1-50" },
{ "path": "big.pdf", "pages": "51-100" }
]
}
Solution: Upgrade to v1.3.0+
npm update @sylphx/pdf-reader-mcp
Restart your MCP client completely.
Causes:
Solutions:
Use absolute path:
{ "path": "C:\\Full\\Path\\file.pdf" }
Or configure cwd:
{
"pdf-reader-mcp": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"],
"cwd": "/path/to/docs"
}
}
Solution:
npm cache clean --force
rm -rf node_modules package-lock.json
npm install @sylphx/pdf-reader-mcp@latest
Restart MCP client completely.
By default, PDF Reader MCP uses stdio transport for local use. You can also run it as an HTTP server for remote access from multiple machines.
# Run as HTTP server on port 8080
MCP_TRANSPORT=http npx @sylphx/pdf-reader-mcp
| Variable | Default | Description |
|---|---|---|
MCP_TRANSPORT |
stdio |
Transport type: stdio or http |
MCP_HTTP_PORT |
8080 |
HTTP server port |
MCP_HTTP_HOST |
0.0.0.0 |
HTTP server hostname |
MCP_API_KEY |
- | Optional API key for authentication |
FROM oven/bun:1
WORKDIR /app
RUN bun add @sylphx/pdf-reader-mcp
ENV MCP_TRANSPORT=http
ENV MCP_HTTP_PORT=8080
EXPOSE 8080
CMD ["bun", "node_modules/@sylphx/pdf-reader-mcp/dist/index.js"]
{
"servers": {
"pdf-reader": {
"type": "http",
"url": "https://your-server.com/mcp",
"headers": {
"X-API-Key": "your-api-key"
}
}
}
}
| Endpoint | Method | Description |
|---|---|---|
/mcp |
POST | JSON-RPC endpoint |
/mcp/health |
GET | Health check |
| Component | Technology |
|---|---|
| Runtime | Node.js 22+ ESM |
| PDF Engine | PDF.js (Mozilla) |
| Validation | Zod + JSON Schema |
| Protocol | MCP SDK |
| Language | TypeScript (strict) |
| Testing | Vitest (103 tests) |
| Quality | Biome (50x faster) |
| CI/CD | GitHub Actions |
any types, strict modePrerequisites:
Setup:
git clone https://github.com/SylphxAI/pdf-reader-mcp.git
cd pdf-reader-mcp
pnpm install && pnpm build
Scripts:
pnpm run build # Build TypeScript
pnpm run test # Run 103 tests
pnpm run test:cov # Coverage (94%+)
pnpm run check # Lint + format
pnpm run check:fix # Auto-fix
pnpm run benchmark # Performance tests
Quality:
Quick Start:
git checkout -b feature/awesomepnpm testpnpm run check:fixCommit Format:
feat(images): add WebP support
fix(paths): handle UNC paths
docs(readme): update examples
See CONTRIBUTING.md
✅ Completed
🚀 Next
Vote at Discussions
Featured on:
Trusted worldwide • Enterprise adoption • Battle-tested
Show Your Support: ⭐ Star • 👀 Watch • 🐛 Report bugs • 💡 Suggest features • 🔀 Contribute
103 Tests • 94%+ Coverage • Production Ready
MIT © Sylphx
Built with:
Special thanks to the open source community ❤️
This project uses the following @sylphx packages:
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"pdf-reader-mcp": {
"command": "npx",
"args": [
"-y",
"@sylphx/pdf-reader-mcp"
]
}
}
}Web content fetching and conversion for efficient LLM usage.
Retrieval from AWS Knowledge Base using Bedrock Agent Runtime.
Provides auto-configuration for setting up an MCP server in Spring Boot applications.
A very streamlined mcp client that supports calling and monitoring stdio/sse/streamableHttp, and can also view request responses through the /logs page. It also