PDF Server

БесплатноНе проверен

Provides tools for extracting text, performing OCR on scanned documents, and retrieving images from PDF files. It enables Model Context Protocol clients to proc

автор: OptLTD

GitHub

Описание

Provides tools for extracting text, performing OCR on scanned documents, and retrieving images from PDF files. It enables Model Context Protocol clients to process PDF content page-by-page through specialized extraction tools.

README

A PDF file reading server based on FastMCP.

Supports PDF text extraction, OCR recognition, and image extraction via the MCP protocol, with a built-in web debugger for easy testing.

🚀 Features

read_pdf_text
Extracts normal text from a PDF (page by page).
read_by_ocr
Uses OCR to recognize text from scanned or image-based PDFs.
read_pdf_images
Extracts all images from a specified PDF page (Base64 encoded output).

📂 Project Structure

mcp-pdf-server/
├── pdf_server.py         # Main server entry point
└── README.md             # Project documentation

⚙️ Installation

Recommended Python version: 3.9+

pip install pymupdf mcp

Note: To use OCR features, you may need a MuPDF build with OCR support or external OCR libraries.

🤖 Configuration

{
  "mcpServers": {
    "pdf-reader": {
      "command": "uvx",
      "timeout": 60000,
      "args": [
        "mcp-pdf-reader"
      ]
    }
  }
}

🔦 Start the Server

Run the following command:

python pdf_server.py

You should see logs like:

INFO:mcp-pdf-server:Starting MCP PDF Server...

🛠️ API Tool List

Tool	Description	Input Parameters	Returns
`read_pdf_text`	Extracts normal text from PDF pages	`file_path`, `start_page`, `end_page`	List of page texts
`read_by_ocr`	Recognizes text via OCR	`file_path`, `start_page`, `end_page`, `language`, `dpi`	OCR extracted text
`read_pdf_images`	Extracts images from a PDF page	`file_path`, `page_number`	List of images (Base64 encoded)

📝 Example Usage

Extract text from pages 1 to 5:

mcp run read_pdf_text --args '{"file_path": "pdf_resources/example.pdf", "start_page": 1, "end_page": 5}'

Perform OCR recognition on page 1:

mcp run read_by_ocr --args '{"file_path": "pdf_resources/example.pdf", "start_page": 1, "end_page": 1, "language": "eng"}'

Extract all images from page 3:

mcp run read_pdf_images --args '{"file_path": "pdf_resources/example.pdf", "page_number": 3}'

📢 Notes

Files must be placed inside the pdf_resources/ directory, or an absolute path must be provided.
OCR functionality requires appropriate OCR support in the environment.
When processing large files, adjust memory and timeout settings as needed.

📜 License

This project is licensed under the MIT License.
For commercial use, please credit the original source.

Как установить

Выполни в терминале:

claude mcp add mcp-pdf-server -- npx

PDF Server

Описание

README

🚀 Features

📂 Project Structure

⚙️ Installation

🤖 Configuration

🔦 Start the Server

🛠️ API Tool List

📝 Example Usage

📢 Notes

📜 License

Как установить

Похожие MCP

Compare PDF Server with

GitHub

Supabase

Filesystem

Everything

Command Palette

PDF Server

Описание

README

🚀 Features

📂 Project Structure

⚙️ Installation

🤖 Configuration

🔦 Start the Server

🛠️ API Tool List

📝 Example Usage

📢 Notes

📜 License

Как установить

Похожие MCP

Compare PDF Server with

GitHub

Supabase

Filesystem

Everything