loading…
Search for a command to run...
loading…
Enables AI agents to read documents in Excel, DOCX, PDF, and TXT formats via MCP protocol.
Enables AI agents to read documents in Excel, DOCX, PDF, and TXT formats via MCP protocol.
MCP (Model Context Protocol) Document Reader - A powerful MCP tool for reading documents in multiple formats, enabling AI agents to truly "read" your documents.
User Guide · API Reference · Contributing · Changelog · License
graph TB
A[AI Assistant / User] -->|Call read_document| B[MCP Document Reader]
B -->|Detect file type| C{File Type?}
C -->|.docx| D[DOCX Reader]
C -->|.pdf| E[PDF Reader]
C -->|.xlsx/.xls| F[Excel Reader]
C -->|.txt| G[Text Reader]
D -->|Extract text| H[Return Content]
E -->|Extract text| H
F -->|Extract text| H
G -->|Extract text| H
H -->|Text content| A
style A fill:#e1f5ff
style B fill:#fff4e1
style C fill:#f0f0f0
style D fill:#e8f5e9
style E fill:#e8f5e9
style F fill:#e8f5e9
style G fill:#e8f5e9
style H fill:#fff9c4
| Format | Extensions | MIME Type | Features |
|---|---|---|---|
| Excel | .xlsx, .xls | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet | Sheet and cell data extraction |
| DOCX | .docx | application/vnd.openxmlformats-officedocument.wordprocessingml.document | Text and structure extraction |
| application/pdf | Text extraction | ||
| Text | .txt | text/plain | Plain text reading |
pip install mcp-documents-reader
git clone https://github.com/xt765/mcp_documents_reader.git
cd mcp_documents_reader
pip install -e .
This server provides the following tool:
read_documentRead any supported document type with a unified interface.
Arguments:
filename (string, required): Document file path, supports absolute or relative paths.Add the following to your MCP configuration file:
Option 1: Using PyPI (Recommended)
{
"mcpServers": {
"mcp-document-reader": {
"command": "uvx",
"args": [
"mcp-documents-reader"
]
}
}
}
Option 2: Using GitHub repository
{
"mcpServers": {
"mcp-document-reader": {
"command": "uvx",
"args": [
"--from",
"git+https://github.com/xt765/mcp_documents_reader",
"mcp_documents_reader"
]
}
}
}
Option 3: Using Gitee repository (Faster access in China)
{
"mcpServers": {
"mcp-document-reader": {
"command": "uvx",
"args": [
"--from",
"git+https://gitee.com/xt765/mcp_documents_reader",
"mcp_documents_reader"
]
}
}
}
After configuration, AI assistants can directly call the following tool:
# Read a DOCX file
read_document(filename="example.docx")
# Read a PDF file
read_document(filename="example.pdf")
# Read an Excel file
read_document(filename="example.xlsx")
# Read a text file
read_document(filename="example.txt")
from mcp_documents_reader import DocumentReaderFactory
# Using factory (recommended)
reader = DocumentReaderFactory.get_reader("document.pdf")
content = reader.read("/path/to/document.pdf")
# Check if format is supported
if DocumentReaderFactory.is_supported("file.xlsx"):
reader = DocumentReaderFactory.get_reader("file.xlsx")
content = reader.read("/path/to/file.xlsx")
Read any supported document type.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| filename | string | ✅ | Document file path, supports absolute or relative paths |
mcp >= 1.26.0 - MCP protocol implementationpython-docx >= 1.2.0 - DOCX file readingpypdf >= 6.8.0 - PDF file reading (replaces PyPDF2)openpyxl >= 3.1.5 - Excel file readingpytest >= 8.0.0 - Testing frameworkpytest-asyncio >= 0.24.0 - Async testing supportpytest-cov >= 6.0.0 - Coverage reportingbasedpyright >= 0.28.0 - Type checkingruff >= 0.8.0 - Linting and formattingMIT License
Issues and Pull Requests are welcome!
Run in your terminal:
claude mcp add mcp-documents-reader -- npx Security
Low riskAutomated heuristic from public metadata — not a security guarantee.