loading…
Search for a command to run...
loading…
An MCP server providing tools to read PDF files.
Production-ready PDF processing server for AI agents
npm version License CI/CD codecov TypeScript Downloads
5-10x faster parallel processing • Y-coordinate content ordering • 94%+ test coverage • 173 tests passing
PDF Reader MCP is a production-ready Model Context Protocol server that empowers AI agents with enterprise-grade PDF processing capabilities. Extract text, images, and metadata with unmatched performance and reliability.
The Problem:
// Traditional PDF processing
- Sequential page processing (slow)
- No natural content ordering
- Complex path handling
- Poor error isolation
The Solution:
// PDF Reader MCP
- 5-10x faster parallel processing ⚡
- Y-coordinate based ordering 📐
- Flexible path support (absolute/relative) 🎯
- Per-page error resilience 🛡️
- 94%+ test coverage ✅
Result: Production-ready PDF processing that scales.
Real-world performance from production testing:
| Operation | Ops/sec | Performance | Use Case |
|---|---|---|---|
| Error handling | 12,933 | ⚡⚡⚡⚡⚡ | Validation & safety |
| Extract full text | 5,575 | ⚡⚡⚡⚡ | Document analysis |
| Extract page | 5,329 | ⚡⚡⚡⚡ | Single page ops |
| Multiple pages | 5,242 | ⚡⚡⚡⚡ | Batch processing |
| Metadata only | 4,912 | ⚡⚡⚡ | Quick inspection |
| Document | Sequential | Parallel | Speedup |
|---|---|---|---|
| 10-page PDF | ~2s | ~0.3s | 5-8x faster |
| 50-page PDF | ~10s | ~1s | 10x faster |
| 100+ pages | ~20s | ~2s | Linear scaling with CPU cores |
Benchmarks vary based on PDF complexity and system resources.
claude mcp add pdf-reader -- npx @sylphx/pdf-reader-mcp
Add to claude_desktop_config.json:
{
"mcpServers": {
"pdf-reader": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"]
}
}
}
~/Library/Application Support/Claude/claude_desktop_config.json%APPDATA%\Claude\claude_desktop_config.json~/.config/Claude/claude_desktop_config.jsoncode --add-mcp '{"name":"pdf-reader","command":"npx","args":["@sylphx/pdf-reader-mcp"]}'
npx @sylphx/pdf-reader-mcpAdd to your Windsurf MCP config:
{
"mcpServers": {
"pdf-reader": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"]
}
}
}
Add to Cline's MCP settings:
{
"mcpServers": {
"pdf-reader": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"]
}
}
}
npx, Args: @sylphx/pdf-reader-mcpAdd the server in Settings → MCP Servers → Add Server with command npx and args @sylphx/pdf-reader-mcp. See Ontheia's compatible MCP servers for the full list.
npx -y @smithery/cli install @sylphx/pdf-reader-mcp --client claude
# Quick start - zero installation
npx @sylphx/pdf-reader-mcp
# Or install globally
npm install -g @sylphx/pdf-reader-mcp
{
"sources": [{
"path": "documents/report.pdf"
}],
"include_full_text": true,
"include_metadata": true,
"include_page_count": true
}
Result:
{
"sources": [{
"path": "documents/manual.pdf",
"pages": "1-5,10,15-20"
}],
"include_full_text": true
}
// Windows - Both formats work!
{
"sources": [{
"path": "C:\\Users\\John\\Documents\\report.pdf"
}],
"include_full_text": true
}
// Unix/Mac
{
"sources": [{
"path": "/home/user/documents/contract.pdf"
}],
"include_full_text": true
}
No more "Absolute paths are not allowed" errors!
{
"sources": [{
"path": "presentation.pdf",
"pages": [1, 2, 3]
}],
"include_images": true,
"include_full_text": true
}
Response includes:
{
"sources": [
{ "path": "C:\\Reports\\Q1.pdf", "pages": "1-10" },
{ "path": "/home/user/Q2.pdf", "pages": "1-10" },
{ "url": "https://example.com/Q3.pdf" }
],
"include_full_text": true
}
⚡ All PDFs processed in parallel automatically!
// ✅ Windows
{ "path": "C:\\Users\\John\\Documents\\report.pdf" }
{ "path": "C:/Users/John/Documents/report.pdf" }
// ✅ Unix/Mac
{ "path": "/home/john/documents/report.pdf" }
{ "path": "/Users/john/Documents/report.pdf" }
// ✅ Relative (still works)
{ "path": "documents/report.pdf" }
Other Improvements:
v1.2.0 - Content Ordering
v1.1.0 - Image Extraction & Performance
read_pdf ToolThe single tool that handles all PDF operations.
| Parameter | Type | Description | Default |
|---|---|---|---|
sources |
Array | List of PDF sources to process | Required |
include_full_text |
boolean | Extract full text content | false |
include_metadata |
boolean | Extract PDF metadata | true |
include_page_count |
boolean | Include total page count | true |
include_images |
boolean | Extract embedded images | false |
{
path?: string; // Local file path (absolute or relative)
url?: string; // HTTP/HTTPS URL to PDF
pages?: string | number[]; // Pages to extract: "1-5,10" or [1,2,3]
}
Metadata only (fast):
{
"sources": [{ "path": "large.pdf" }],
"include_metadata": true,
"include_page_count": true,
"include_full_text": false
}
From URL:
{
"sources": [{
"url": "https://arxiv.org/pdf/2301.00001.pdf"
}],
"include_full_text": true
}
Page ranges:
{
"sources": [{
"path": "manual.pdf",
"pages": "1-5,10-15,20" // Pages 1,2,3,4,5,10,11,12,13,14,15,20
}]
}
Content is returned in natural reading order based on Y-coordinates:
Document Layout:
┌─────────────────────┐
│ [Title] Y:100 │
│ [Image] Y:150 │
│ [Text] Y:400 │
│ [Photo A] Y:500 │
│ [Photo B] Y:550 │
└─────────────────────┘
Response Order:
[
{ type: "text", text: "Title..." },
{ type: "image", data: "..." },
{ type: "text", text: "..." },
{ type: "image", data: "..." },
{ type: "image", data: "..." }
]
Benefits:
Enable extraction:
{
"sources": [{ "path": "manual.pdf" }],
"include_images": true
}
Response format:
{
"images": [{
"page": 1,
"index": 0,
"width": 1920,
"height": 1080,
"format": "rgb",
"data": "base64-encoded-png..."
}]
}
Supported formats: RGB, RGBA, Grayscale Auto-detected: JPEG, PNG, and other embedded formats
Absolute paths (v1.3.0+) - Direct file access:
{ "path": "C:\\Users\\John\\file.pdf" }
{ "path": "/home/user/file.pdf" }
Relative paths - Workspace files:
{ "path": "docs/report.pdf" }
{ "path": "./2024/Q1.pdf" }
Configure working directory:
{
"mcpServers": {
"pdf-reader-mcp": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"],
"cwd": "/path/to/documents"
}
}
}
Strategy 1: Page ranges
{ "sources": [{ "path": "big.pdf", "pages": "1-20" }] }
Strategy 2: Progressive loading
// Step 1: Get page count
{ "sources": [{ "path": "big.pdf" }], "include_full_text": false }
// Step 2: Extract sections
{ "sources": [{ "path": "big.pdf", "pages": "50-75" }] }
Strategy 3: Parallel batching
{
"sources": [
{ "path": "big.pdf", "pages": "1-50" },
{ "path": "big.pdf", "pages": "51-100" }
]
}
By default the server can read any local file the host process can access and fetch any HTTP(S) URL. When running outside a sandbox you should restrict it to a specific working set.
Use --allow-dir (repeatable) or the MCP_PDF_ALLOWED_DIRS env var (: or , separated). Once set, all path sources must resolve inside one of the allowed directories — relative paths, absolute paths, and .. traversal are all checked after resolution.
# CLI flags
npx @sylphx/pdf-reader-mcp --allow-dir=/srv/pdfs --allow-dir=/data/reports
# Environment
MCP_PDF_ALLOWED_DIRS="/srv/pdfs:/data/reports" npx @sylphx/pdf-reader-mcp
{
"mcpServers": {
"pdf-reader": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp", "--allow-dir=/srv/pdfs"]
}
}
}
# Block all URL sources
npx @sylphx/pdf-reader-mcp --no-http
MCP_PDF_ALLOW_HTTP=false npx @sylphx/pdf-reader-mcp
# Allowlist hosts (everything else rejected)
npx @sylphx/pdf-reader-mcp --allow-host=cdn.example.com --allow-host=files.internal
MCP_PDF_ALLOWED_HOSTS="cdn.example.com,files.internal" npx @sylphx/pdf-reader-mcp
| Setting | CLI flag | Environment variable | Default |
|---|---|---|---|
| Filesystem allowlist | --allow-dir=<path> (repeatable) |
MCP_PDF_ALLOWED_DIRS (: or , separated) |
unrestricted |
| Disable HTTP | --no-http |
MCP_PDF_ALLOW_HTTP=false |
enabled |
| HTTP host allowlist | --allow-host=<host> (repeatable) |
MCP_PDF_ALLOWED_HOSTS (, separated) |
any host |
Denied requests fail fast with an Access denied error before any disk read or network call.
Solution: Upgrade to v1.3.0+
npm update @sylphx/pdf-reader-mcp
Restart your MCP client completely.
Causes:
Solutions:
Use absolute path:
{ "path": "C:\\Full\\Path\\file.pdf" }
Or configure cwd:
{
"pdf-reader-mcp": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"],
"cwd": "/path/to/docs"
}
}
Solution:
npm cache clean --force
rm -rf node_modules package-lock.json
npm install @sylphx/pdf-reader-mcp@latest
Restart MCP client completely.
By default, PDF Reader MCP uses stdio transport for local use. You can also run it as an HTTP server for remote access from multiple machines.
# Run as HTTP server on port 8080
MCP_TRANSPORT=http npx @sylphx/pdf-reader-mcp
| Variable | Default | Description |
|---|---|---|
MCP_TRANSPORT |
stdio |
Transport type: stdio or http |
MCP_HTTP_PORT |
8080 |
HTTP server port |
MCP_HTTP_HOST |
0.0.0.0 |
HTTP server hostname |
MCP_API_KEY |
- | Optional API key for authentication |
FROM oven/bun:1
WORKDIR /app
RUN bun add @sylphx/pdf-reader-mcp
ENV MCP_TRANSPORT=http
ENV MCP_HTTP_PORT=8080
EXPOSE 8080
CMD ["bun", "node_modules/@sylphx/pdf-reader-mcp/dist/index.js"]
{
"servers": {
"pdf-reader": {
"type": "http",
"url": "https://your-server.com/mcp",
"headers": {
"X-API-Key": "your-api-key"
}
}
}
}
| Endpoint | Method | Description |
|---|---|---|
/mcp |
POST | JSON-RPC endpoint |
/mcp/health |
GET | Health check |
| Component | Technology |
|---|---|
| Runtime | Node.js 22+ ESM |
| PDF Engine | PDF.js (Mozilla) |
| Validation | Vex + JSON Schema |
| Protocol | MCP SDK |
| Language | TypeScript (strict) |
| Testing | Bun test (173 tests) |
| Quality | Biome (50x faster) |
| CI/CD | GitHub Actions |
any types, strict modePrerequisites:
[email protected])Setup:
git clone https://github.com/SylphxAI/pdf-reader-mcp.git
cd pdf-reader-mcp
bun install && bun run build
Scripts:
bun run build # Build with bunup
bun test # Run 173 tests
bun run test:cov # Coverage (94%+)
bun run check # Lint + format
bun run check:fix # Auto-fix
bun run benchmark # Performance tests
Quality:
Quick Start:
git checkout -b feature/awesomebun testbun run check:fixCommit Format:
feat(images): add WebP support
fix(paths): handle UNC paths
docs(readme): update examples
See CONTRIBUTING.md
✅ Completed
🚀 Next
Vote at Discussions
Featured on:
Trusted worldwide • Enterprise adoption • Battle-tested
Show Your Support: ⭐ Star • 👀 Watch • 🐛 Report bugs • 💡 Suggest features • 🔀 Contribute
103 Tests • 94%+ Coverage • Production Ready
MIT © Sylphx
Built with:
Special thanks to the open source community ❤️
This project uses the following @sylphx packages:
Run in your terminal:
claude mcp add pdf-reader-mcp --env MCP_PDF_ALLOWED_DIRS="" --env MCP_PDF_ALLOWED_HOSTS="" --env MCP_PDF_ALLOW_HTTP="" --env MCP_TRANSPORT="" -- npx -y @sylphx/pdf-reader-mcppro tip
Just installed Pdf Reader Mcp? Say to Claude: "remember why I installed Pdf Reader Mcpand what I want to try" — it'll save into your Vault.
how this works →CSA PROJECT - FZCO © 2026 IFZA Business Park, DDP, Premises Number 31174 - 001
Security
Low riskAutomated heuristic from public metadata — not a security guarantee.