loading…
Search for a command to run...
loading…
Extracts text and metadata from various file formats (PDF, DOCX, images with OCR) using Apache Tika, enabling AI assistants to understand file contents.
Extracts text and metadata from various file formats (PDF, DOCX, images with OCR) using Apache Tika, enabling AI assistants to understand file contents.
This project provides a Model Context Protocol (MCP) server for extracting content and metadata from files using Apache Tika.
The Tika MCP server allows AI assistants to extract text and metadata from various file formats (PDF, DOCX, images with OCR, etc.) using Apache Tika. This enables AI assistants to understand and work with the content of files that users upload.
pip install -r requirements.txtpython -m app.register_mcp_serverThe Tika MCP server provides a single tool:
extract_fileExtracts content and metadata from a file using Apache Tika.
Parameters:
file_path: Path to the file to extract content fromtika_url: URL of the running Tika server (default: http://localhost:9998)Returns:
metadata: Dictionary of metadata extracted from the filecontent: Array of content blocks extracted from the fileSeveral test scripts are provided to verify the functionality:
app/test_tika_simple.py: Tests the Tika client directlyapp/test_simple_mcp.py: Tests the MCP server using the JSON-RPC protocolapp/: Main application codesimple_mcp_server.py: MCP server implementationtika_client.py: Client for Apache Tikamodel.py: Data models and business logicregister_mcp_server.py: Script to register the MCP serverexamples/: Example files for testingrequirements.txt: Python dependenciesGet a venv using either:
uv venv
or
python3 -m venv .venv
Activate the virtual environment and install dependencies:
source .venv/bin/activate
pip install -r requirements.txt
Start the Apache Tika server (if not already running):
docker run -d -p 9998:9998 apache/tika
Register and run the MCP server:
python -m app.register_mcp_server
MIT
Выполни в терминале:
claude mcp add tika-mcp-server -- npx Безопасность
Низкий рискАвтоматическая эвристика по публичным данным — не гарантия безопасности.