loading…
Search for a command to run...
loading…
Go MCP server for multi-format document access — PDF, TXT, MD, DOCX, CSV, images. 12 tools including OCR, search, table extraction, and URL fetch. Single binary
Go MCP server for multi-format document access — PDF, TXT, MD, DOCX, CSV, images. 12 tools including OCR, search, table extraction, and URL fetch. Single binary, no runtime.
Install and Go. One command, single binary. Your AI reads any document — PDF, text, Markdown, DOCX, images.
MCP server for multi-format document access — read, search, extract images, OCR, and fetch documents from URLs via the Model Context Protocol. 13 tools, 6 formats, zero configuration.
go install github.com/drolosoft/go-docs-mcp@latest
# That's it. Single binary, starts in milliseconds.
For a deeper look at why an MCP server beats a direct tool, see Why MCP?
Every other document MCP server handles one format — a PDF server for PDFs, a DOCX server for DOCX. You'd need three separate servers to read three formats.
| Go-Docs MCP | Others | |
|---|---|---|
| Single binary, no runtime | Yes | Need Node/Python |
go install one-liner |
Yes | npm+deps or pip+venv |
| Multi-format (6 types) | Yes | One format each |
| Full-text search | Yes | Partial or none |
| OCR (scanned PDFs + images) | Yes | Rare |
| Image & table extraction | Yes | Partial |
| Document outline | Yes | Rare |
| Fetch from URL | Yes | Rare |
| Dir-locked, read-only | Yes | Varies |
| Smart caching | Yes | No |
| Fully offline | Yes | Yes |
Go-Docs MCP reads them all from a single binary — fast, secure, and dependency-free at runtime.
| Category | Tool | Description |
|---|---|---|
| Discovery | list_documents |
List all documents with metadata (format, pages, size) |
| Discovery | list_formats |
List supported formats and dependency status |
| Reading | read_document |
Full text, specific page, or page ranges from any format |
| Reading | read_url |
Download from URL and extract text (50MB max) |
| Reading | get_document_summary |
First 3 pages as a quick overview |
| Search | search_document |
Case-insensitive full-text search with context |
| Analysis | get_document_metadata |
Title, author, dates, version, page count |
| Analysis | get_document_outline |
Table of contents / bookmarks |
| Analysis | extract_tables |
Tables as structured data |
| Analysis | extract_images |
Images as base64 (max 10 per call) |
| OCR | ocr_document |
Force OCR on scanned/image-based PDFs |
| OCR | read_image |
Extract text from PNG, JPG, TIFF via OCR |
| Export | convert_to_markdown |
Convert any document to clean Markdown |
Highlights:
| Format | Dependencies | Notes |
|---|---|---|
poppler (pdftotext, pdfinfo, pdfimages, pdftoppm) |
Full support — text, images, metadata, OCR fallback | |
| TXT, MD, CSV | None | Native, zero dependencies |
| DOCX | pandoc (optional) | Word document extraction |
| Images (PNG, JPG, TIFF) | tesseract (optional) | OCR text extraction |
# macOS
brew install poppler
brew install tesseract # optional: OCR
brew install pandoc # optional: DOCX
# Debian/Ubuntu
apt install poppler-utils
apt install tesseract-ocr # optional: OCR
apt install pandoc # optional: DOCX
# Fedora/RHEL
dnf install poppler-utils
dnf install tesseract # optional: OCR
dnf install pandoc # optional: DOCX
Note: TXT, MD, and CSV work out of the box with zero dependencies. Install only what you need.
go install github.com/drolosoft/go-docs-mcp@latest
git clone https://github.com/drolosoft/go-docs-mcp.git
cd go-docs-mcp
make build # produces ./go-docs-mcp
make install # installs to /usr/local/bin/
Go-Docs MCP reads documents from a configured directory. Set DOCS_MCP_DIR to change it:
| Variable | Default | Description |
|---|---|---|
DOCS_MCP_DIR |
~/.docs-mcp/documents/ |
Directory containing documents to serve |
PDF_MCP_DIR |
(legacy alias) | Backward-compatible alias for DOCS_MCP_DIR |
Place your documents in the directory and the server finds them automatically. All supported formats are detected.
Add to your .claude/settings.json:
{
"mcpServers": {
"docs": {
"command": "go-docs-mcp",
"env": {
"DOCS_MCP_DIR": "/path/to/your/documents"
}
}
}
}
Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):
{
"mcpServers": {
"docs": {
"command": "/usr/local/bin/go-docs-mcp",
"env": {
"DOCS_MCP_DIR": "/path/to/your/documents"
}
}
}
}
The server communicates over stdio using JSON-RPC 2.0:
echo '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | go-docs-mcp
list_documentsLists all documents in the configured directory with format detection.
Parameters: None
Example output:
[
{
"filename": "architecture-guide.pdf",
"format": "pdf",
"title": "architecture-guide",
"pages": 42,
"size_bytes": 1048576
},
{
"filename": "notes.md",
"format": "markdown",
"title": "notes",
"size_bytes": 4096
}
]
list_formatsLists all supported document formats and their dependency status.
Parameters: None
read_documentReads the extracted text content of a document. Automatically falls back to OCR if the document is image-based/scanned and pdftotext returns empty text.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
filename |
string | Yes | The document filename to read |
page |
number | No | Single page number (1-based). Omit for full text. |
pages |
string | No | Page ranges, e.g. "1-5", "10", "1-3,7,10-12". Overrides page. |
Example input:
{
"filename": "architecture-guide.pdf",
"pages": "1-3,10-12"
}
search_documentSearches within a document for lines matching a query. Returns matches with 2 lines of context and approximate page numbers.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
filename |
string | Yes | The document filename to search |
query |
string | Yes | Search query (case-insensitive) |
Example output:
Found 3 matches for 'microservice' in architecture-guide.pdf:
--- Match 1 (page ~2, line 45) ---
The system is composed of several
> microservice components that communicate
via gRPC and message queues.
get_document_summaryReturns the text from the first 3 pages of a document as a quick summary.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
filename |
string | Yes | The document filename to summarize |
get_document_metadataReturns full document metadata.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
filename |
string | Yes | The document filename to get metadata for |
Example output:
{
"title": "Architecture Guide",
"author": "Jane Doe",
"subject": "System Design",
"creator": "LaTeX",
"producer": "pdfTeX",
"creation_date": "Thu May 15 10:30:00 2025",
"modification_date": "Thu May 15 10:30:00 2025",
"pages": 42,
"file_size_bytes": 1048576,
"pdf_version": "1.5"
}
get_document_outlineExtracts the document outline (table of contents / bookmarks) as a structured list.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
filename |
string | Yes | The document filename to extract outline from |
extract_tablesExtracts tables from a document as structured data.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
filename |
string | Yes | The document filename to extract tables from |
page |
number | No | Specific page to extract from. Omit for all pages. |
extract_imagesExtracts images from a document as base64-encoded data. Returns up to 10 images per call.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
filename |
string | Yes | The document filename to extract images from |
page |
number | No | Specific page to extract from. Omit for all pages. |
Example output:
[
{
"page": 1,
"index": 0,
"format": "jpeg",
"width": 800,
"height": 600,
"data_base64": "/9j/4AAQSkZJRg..."
}
]
read_urlDownloads a document from a URL and extracts its text content. Maximum file size: 50MB.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
url |
string | Yes | The URL of the document to download and read |
pages |
string | No | Page ranges to extract, e.g. "1-5". Omit for full text. |
Example input:
{
"url": "https://example.com/report.pdf",
"pages": "1-3"
}
ocr_documentForces OCR on a PDF document using tesseract. Useful for scanned/image-based PDFs or when pdftotext returns garbled text. Requires tesseract and pdftoppm.
Note:
read_documentalready auto-detects image-based PDFs and falls back to OCR. Useocr_documentwhen you want to force OCR regardless, or need to specify a non-English language.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
filename |
string | Yes | The PDF filename to OCR |
page |
number | No | Specific page to OCR (1-based). Omit for all pages. |
language |
string | No | Tesseract language code (default: eng). Use spa, fra, etc. |
Example input:
{
"filename": "scanned-contract.pdf",
"page": 1,
"language": "spa"
}
read_imageExtracts text from an image file using OCR. Supports PNG, JPG, and TIFF. Requires tesseract.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
filename |
string | Yes | The image filename to read (PNG, JPG, TIFF) |
language |
string | No | Tesseract language code (default: eng). |
Example input:
{
"filename": "receipt.png",
"language": "eng"
}
DOCS_MCP_DIR are accessible../ rejectedmake build # Build the binary
make test # Run tests with race detector
make clean # Remove build artifacts
go-docs-mcp/
main.go # MCP server setup, 12 tool registrations
internal/
pdf/
reader.go # Document extraction, caching, search, metadata, images, OCR
Makefile # Build targets
go.mod # Module definition
MIT - Copyright 2026 Drolosoft
Drolosoft — Tools we wish existed
Run in your terminal:
claude mcp add go-docs-mcp -- npx