loading…
Search for a command to run...
loading…
MCP server for local semantic search over web content, enabling AI agents to ingest, index, and query pages with hybrid retrieval and token budget control.
MCP server for local semantic search over web content, enabling AI agents to ingest, index, and query pages with hybrid retrieval and token budget control.
Fast, local semantic search over web content for AI agents. Sifts the net for signal — uses ~90% fewer tokens than raw web_fetch.
When an AI agent researches the web, the usual flow is: search → fetch 10 pages → drown in 100k+ tokens of irrelevant prose. nesift sits between the web and the agent: it ingests pages on the fly, indexes them with hybrid BM25 + embeddings, deduplicates redundant content across sources, and returns only the chunks that fit your token budget.
pip install -e ., no database, no daemon./tmp and is per-session by default.potion-retrieval-32M embeddings fused via RRF.--budget N trims results to N tokens.nesift search "..." does search + filter + fetch + index + answer in one command.git clone [email protected]:scottgl9/nesift.git
cd nesift
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
Requires Python 3.11+.
# Index a page and ask about it
nesift add https://en.wikipedia.org/wiki/Retrieval-augmented_generation
nesift query "what is RAG used for" --budget 1500
nesift answer "how does RAG reduce hallucinations"
# Pre-fetch scoring — rank snippets before downloading
nesift score "vector database" "Pinecone is a vector DB" "How to bake bread"
# One-shot SearXNG search + ingest + answer
NESIFT_SEARXNG_URL=http://127.0.0.1:8888 \
nesift search "retry logic in distributed systems" --top 5 --budget 2000
nesift list
nesift clear
See docs/cli.md for every command and flag.
URL → trafilatura extract → heading-aware chunker → triage summary
→ BM25 index + potion-retrieval-32M embeddings (CPU)
→ query: RRF fusion + dedup + budget trim → ranked chunks or synthesized answer
See docs/architecture.md.
pip install "nesift[mcp]"
nesift-mcp # stdio MCP server
Tools exposed: score_snippets, add_page, add_batch, query, answer, list_pages, clear, search. See docs/mcp.md.
nesift add https://arxiv.org/pdf/2005.11401.pdf
Content type is auto-detected; .pdf URLs (or any response with the PDF signature) route through pypdf.
nesift add https://es.wikipedia.org/wiki/... --lang
--lang swaps in potion-multilingual-128M (101 languages).
GPL-2.0-only — see LICENSE.
Run in your terminal:
claude mcp add nesift-mcp -- npx Security
Low riskAutomated heuristic from public metadata — not a security guarantee.