loading…
Search for a command to run...
loading…
Structure any document, query it like a database. Open-source extraction engine that turns any document into typed, schema-defined records, queryable in natural
Structure any document, query it like a database. Open-source extraction engine that turns any document into typed, schema-defined records, queryable in natural language from Claude, ChatGPT, Gemini, or any MCP client.
CI codecov PyPI npm Python Node License: MIT
Your documents are a dark database.
Open-source document intelligence engine — schema-driven extraction, NL query, MCP server, Python and TypeScript SDKs. Self-hostable under MIT.

RAG is built for retrieval — find me chunks similar to this query. It breaks on homogeneous collections like invoices, contracts, or receipts where every document looks alike and the question is an aggregation, not a search.

Sifter's approach: extract structured fields once (client, date, total), store them as typed records, query with real filters and aggregations. The answer is exact and reproducible — because it's a database query, not a similarity search.
git clone https://github.com/sifter-ai/sifter
cd sifter/code
cp server/.env.example server/.env.local # set SIFTER_DEFAULT_API_KEY (required)
docker compose up -d
Open http://localhost:3000 — create a sift, upload documents, query results.
pip install sifter-ai
from sifter import Sifter
s = Sifter(api_key="sk-...")
sift = s.create_sift("Invoices", "client name, date, total amount")
sift.upload("./invoices/")
sift.wait()
for record in sift.records():
print(record["extracted_data"])
# {"client": "Acme Corp", "date": "2024-01-15", "total_amount": 1500.0}
npm install @sifter-ai/sdk
import { Sifter } from "@sifter-ai/sdk";
const client = new Sifter({ apiKey: "sk-..." });
const sift = await client.createSift("Invoices", "client, date, total amount");
await sift.upload("./invoices/");
await sift.wait();
const records = await sift.records();
console.log(records);
{
"mcpServers": {
"sifter": {
"command": "uvx",
"args": ["sifter-mcp", "--base-url", "http://localhost:8000"],
"env": { "SIFTER_API_KEY": "sk-dev" }
}
}
}
Then ask:
"What's the total unpaid across all invoices from last quarter?" "Show me all contracts expiring in the next 90 days." "Which candidates have Python and more than 5 years experience?"
Sifter answers with structured data — exact counts, sums, filtered rows. Not a text blob.
Want a remote MCP URL without running a local server? → Sifter Cloud
Sifter includes a built-in dashboard — no Metabase, no Grafana, no SQL required.
Describe what you want to see in plain language:
sift = client.sifts.get("invoices")
sift.create_dashboard("Show total invoiced and unpaid by vendor, monthly trend")
Produces KPI tiles, breakdowns, and time-series — updated automatically on every extraction.
sifter extract, sifter records, sifter sifts for terminal workflows and CISifter Cloud is the managed version — no Mongo, no ops, remote MCP endpoint, Google Drive and email ingress. Free tier available.
Full documentation at docs.sifter.run — quickstart, SDK reference, MCP guide, cookbook, self-hosting.
MIT — see LICENSE.
Created by Bruno Fortunato.
Run in your terminal:
claude mcp add sifter-ai-sifter -- npx Yes, sifter-ai/sifter MCP is free — one-click install via Unyly at no cost.
No, sifter-ai/sifter runs without API keys or environment variables.
Self-hosted: the server runs locally on your machine via the install command above.
Open sifter-ai/sifter on unyly.org, pick your client tab (Claude Desktop, Claude Code, Cursor) and press Install — the config is generated automatically, no JSON editing.
pro tip
Just installed sifter-ai/sifter? Say to Claude: "remember why I installed sifter-ai/sifterand what I want to try" — it'll save into your Vault.
how this works →CSA PROJECT - FZCO © 2026 IFZA Business Park, DDP, Premises Number 31174 - 001
Security
Low riskAutomated heuristic from public metadata — not a security guarantee.