loading…
Search for a command to run...
loading…
Multi-tier fallback chain for fetching web content as clean markdown. Handles tweets, YouTube, arXiv, PDFs, and regular pages with 9 fallback strategies.
Multi-tier fallback chain for fetching web content as clean markdown. Handles tweets, YouTube, arXiv, PDFs, and regular pages with 9 fallback strategies.
Give your AI the ability to read the web. One command, no API keys required.
Without it, your AI hits a URL and gets a 403, a wall, or a wall of raw HTML. With intercept, it almost always gets the content — clean markdown, ready to use.
Handles tweets, YouTube videos (with transcripts when available), arXiv papers, PDFs, Wikipedia articles, and GitHub repos. If the first strategy fails, it tries up to 14 more before giving up.
Works with any MCP client: Claude Code, Claude Desktop, Codex, Cursor, Windsurf, Cline, and more.
claude mcp add intercept -s user -- npx -y intercept-mcp
codex mcp add intercept -- npx -y intercept-mcp
Settings → MCP → Add Server:
{
"mcpServers": {
"intercept": {
"command": "npx",
"args": ["-y", "intercept-mcp"]
}
}
}
Settings → MCP → Add Server → same JSON config as above.
Add to your claude_desktop_config.json:
{
"mcpServers": {
"intercept": {
"command": "npx",
"args": ["-y", "intercept-mcp"]
}
}
}
Any client that supports stdio MCP servers can run npx -y intercept-mcp.
No API keys needed for the fetch tool.
URLs are processed in four stages:
Known URL patterns are routed to dedicated handlers before the fallback pipeline:
| Pattern | Handler | What you get |
|---|---|---|
twitter.com/*/status/*, x.com/*/status/* |
Twitter/X | Tweet text, author, media, engagement stats (via third-party APIs) |
youtube.com/watch?v=*, youtu.be/* |
YouTube | Title, channel, duration, views, description, transcript (when captions available) |
arxiv.org/abs/*, arxiv.org/pdf/* |
arXiv | Paper metadata, authors, abstract, categories |
*.pdf |
Extracted text (text-layer PDFs only) | |
*.wikipedia.org/wiki/* |
Wikipedia | Clean article content via Wikimedia REST API |
github.com/{owner}/{repo} |
GitHub | Raw README.md content |
github.com/{o}/{r}/blob/{ref}/{path} |
GitHub | Raw file content, code-fenced by language |
github.com/{o}/{r}/issues/{n}, /pull/{n} |
GitHub | Issue/PR title, state, body, diff stats, comments (via GitHub API) |
github.com/{o}/{r}/releases/tag/{t}, /releases/latest |
GitHub | Release notes (via GitHub API) |
The GitHub API endpoints work unauthenticated (60 requests/hour). Set GITHUB_TOKEN to raise the limit.
Before hitting any fetcher, every request checks agentsweb.org — a global shared markdown cache for AI agents backed by a 9-source parallel fetch pipeline with JS/SPA rendering (React, Vue, Angular via Cloudflare Browser Run). If another agent already fetched this URL, you get the result in under 50ms.
Every successful fetch contributes back automatically. Entries gain trust through a self-healing consensus model: when independent instances fetch the same URL and confirm the same content, confidence increases.
Opt out entirely with INTERCEPT_SHARED_CACHE=false, or use read-only mode (consume but never contribute) with INTERCEPT_CACHE_READ_ONLY=true.
agentsweb.org also exposes standalone endpoints for direct use:
/web?q= — search the web/research?q= — search + fetch + cache in one call/fetch?url= — fetch on demand, auto-cachedSee agentsweb.org/docs for full API documentation.
If no handler matches (or the handler returns nothing), the URL enters the multi-tier pipeline:
| Tier | Fetcher | Strategy |
|---|---|---|
| 0 | agentsweb.org | Global shared markdown cache — instant if another agent already fetched this URL |
| 1 | Cloudflare Browser Run | JS/SPA rendering + markdown extraction — also powers agentsweb.org (optional, needs API token) |
| 1 | Jina Reader | Clean markdown extraction service |
| 2 | Wayback Machine | Archived version from archive.org |
| 2 | Arquivo.pt | Portuguese web archive (broad international coverage) |
| 2 | Common Crawl | Petabyte web archive read from Common Crawl's index + S3 — not subject to the origin's rate limits, bot detection, or paywall |
| 2 | Codetabs | CORS proxy |
| 3 | Markdown endpoint | Asks the site for a native markdown version (<path>.md + Accept: text/markdown) |
| 3 | archive.ph | Archived snapshots via timemap API + stealth TLS fetch |
| 3 | Raw fetch | Direct GET with browser headers + Turndown markdown conversion |
| 3 | Stealth fetch | Browser TLS fingerprint impersonation via got-scraping (opt-in, see below) |
| 3 | FlareSolverr | Real-browser challenge solver for Cloudflare/DDoS-Guard (opt-in, needs a FlareSolverr instance) |
| 3 | Web unlocker | Commercial unlocker API — residential rotation + rendering + CAPTCHA (opt-in, BYO key, paid per request) |
| 4 | RSS, CrossRef, Semantic Scholar, HN, Reddit | Metadata / discussion fallbacks |
| 5 | OG Meta | Open Graph tags (guaranteed fallback) |
Tier 2 fetchers run in parallel. When multiple succeed, the highest quality result wins. All other tiers run sequentially.
All fetchers return proper Markdown (headings, links, bold, tables, code blocks) via Turndown — not plain text.
Results are cached in-memory with TTL (60 min for successes, 5 min for failures). Max 250 entries with LRU eviction. Failed URLs are cached to prevent re-attempting known-dead URLs. All three knobs are configurable via INTERCEPT_CACHE_TTL_MS, INTERCEPT_CACHE_FAILURE_TTL_MS, and INTERCEPT_CACHE_SIZE.
fetchFetch a URL and return its content as clean markdown.
url (string, required) — URL to fetchmaxTier (number, optional, 1-5) — Stop at this tier for speed-sensitive casesmaxLength (number, optional, default 50000) — Maximum characters to returnstartIndex (number, optional, default 0) — Character offset for paginating long contentnoCache (boolean, optional) — Skip session and shared caches and fetch liveLong pages are truncated at maxLength with a notice telling the agent which startIndex continues the content. Structured output reports source, quality, contentLength, truncated, nextStartIndex, and cacheAgeSeconds so agents can branch on them programmatically.
Direct image URLs (.png, .jpg, .gif, .webp, up to 5 MB) are returned as an MCP image block instead of text, so the agent's own vision model can read charts, diagrams, screenshots, and scanned documents. The structured output reports source: "image", mimeType, and bytes.
fetch_batchFetch up to 10 URLs in parallel, each through the same handler/fallback chain.
urls (string[], required, 1-10) — URLs to fetchmaxTier, noCache — as in fetchmaxLength (number, optional, default 20000) — Per-URL character budgetresearchSearch the web and fetch the top results in one call — replaces a search followed by several fetches.
query (string, required) — Search querycount (number, optional, 1-5, default 3) — Results to fetchmaxLength (number, optional, default 20000) — Per-result character budgetsite (string, optional) — Restrict to a domainfreshness (string, optional) — day, week, month, or yearsearchSearch the web and return results.
query (string, required) — Search querycount (number, optional, 1-20, default 5) — Number of resultssite (string, optional) — Restrict results to a domainfreshness (string, optional) — day, week, month, or yearpage (number, optional, 1-10) — Results page for paginationUses Brave Search API if BRAVE_API_KEY is set, then SearXNG if SEARXNG_URL is set, then DuckDuckGo as an unreliable last resort. freshness and page are ignored by the DuckDuckGo fallback.
extractExtract specific values from a page as JSON instead of markdown prose — for when you need particular data, not the whole page. Honors per-domain auth and proxies.
url (string, required) — The URL to extract fromselectors (object, optional) — Map of field name → CSS selector. Each value is either a selector string (returns the first match's text) or { selector, attr?, all? } — attr extracts an attribute (e.g. href), all: true returns every match as an array.tables (boolean, optional) — Convert every HTML table to an array of row objects (defaults to true when no selectors are given).{
"url": "https://shop.example.com/item",
"selectors": {
"title": "h1",
"price": ".price",
"images": { "selector": "img.gallery", "attr": "src", "all": true }
}
}
Returns the extracted fields and/or tables as structured output.
intercept://session/recentMarkdown list of URLs fetched and cached in this session, most recent first. Re-fetching any of them is instant.
research-topicSearch for a topic and fetch the top results for a multi-source summary.
topic (string) — The topic to researchdepth (string, default "3") — Number of top results to fetchextract-articleFetch a URL and extract the key points from the content.
url (string) — The URL to fetch and summarize| Variable | Required | Description |
|---|---|---|
BRAVE_API_KEY |
No | Brave Search API key for search |
SEARXNG_URL |
No | Self-hosted SearXNG instance URL (recommended) |
GITHUB_TOKEN |
No | GitHub token raising API rate limits for the issue/PR/release handler |
INTERCEPT_AUTH |
No | JSON map of domain → headers/cookies, to fetch content you're logged in to (see Per-domain authentication) |
CF_API_TOKEN |
No | Cloudflare API token with "Browser Rendering - Edit" permission |
CF_ACCOUNT_ID |
No | Cloudflare account ID (required if CF_API_TOKEN is set) |
USE_STEALTH_FETCH |
No | Set to true to enable stealth fetcher (see warning below) |
FLARESOLVERR_URL |
No | URL of a FlareSolverr instance (e.g. http://localhost:8191) to solve Cloudflare/DDoS-Guard challenges |
WEB_UNLOCKER_URL |
No | GET template (with a {url} placeholder and your API key) for a commercial web-unlocker like ScrapingBee/ScraperAPI/ZenRows — the paid last resort for the hardest sites |
INTERCEPT_SHARED_CACHE |
No | Set to false to disable the agentsweb.org shared cache |
INTERCEPT_CACHE_READ_ONLY |
No | Set to true to consume but never contribute to the shared cache |
INTERCEPT_CACHE_TTL_MS |
No | In-memory cache TTL for successful fetches in ms (default 3600000 = 60 min) |
INTERCEPT_CACHE_FAILURE_TTL_MS |
No | In-memory cache TTL for failed fetches in ms (default 300000 = 5 min) |
INTERCEPT_CACHE_SIZE |
No | Max in-memory cache entries (default 250) |
HTTPS_PROXY / HTTP_PROXY |
No | Standard proxy passthrough — routes all outbound fetches (including stealth) through the proxy. Honors NO_PROXY. |
INTERCEPT_PROXIES |
No | Comma/space-separated list of HTTP(S) proxies to rotate across, with automatic retry through the next proxy on a blocked response. Takes precedence over HTTPS_PROXY. |
Search: Has a DuckDuckGo fallback but it's rate-limited and unreliable. For production use, self-host SearXNG and set SEARXNG_URL (see below), or get a Brave Search API key.
Fetch: Works without any keys. Set CF_API_TOKEN + CF_ACCOUNT_ID to enable Cloudflare Browser Run (formerly Browser Rendering) for JavaScript-heavy pages (SPAs, React sites).
Use at your own risk. When enabled, this adds a fetcher that impersonates real browser TLS fingerprints (Chrome/Firefox cipher suites, HTTP/2 settings, header ordering) using got-scraping. This can bypass bot detection and CAPTCHA triggers on sites that would otherwise block automated requests.
This fetcher runs at tier 3 after the regular raw fetch. If the raw fetch gets blocked (CAPTCHA, Cloudflare challenge, 403), the stealth fetcher retries with browser impersonation.
This may violate the terms of service of some websites. The authors of intercept-mcp take no responsibility for how this feature is used. It is disabled by default and must be explicitly opted into.
The stealth fetcher impersonates a browser's TLS fingerprint, but it can't execute a JavaScript challenge — so sites protected by a Cloudflare "Checking your browser" / DDoS-Guard interstitial still block it. FlareSolverr runs a real headless browser that solves the challenge and returns the page HTML.
Run it (Docker):
docker run -d -p 8191:8191 ghcr.io/flaresolverr/flaresolverr:latest
Then set FLARESOLVERR_URL=http://localhost:8191. It runs at tier 3 as a last resort after the raw and stealth fetchers, and only when this variable is set. Solving a challenge can take 30–60s, so it's the slowest fetcher — but it recovers pages nothing else can.
For the hardest targets — sites that need residential IP rotation and real-browser rendering and CAPTCHA handling together — a commercial unlocker is the pragmatic answer. intercept-mcp supports any unlocker that exposes a "GET this URL, return the HTML" endpoint, via a template with a {url} placeholder that holds your API key:
# ScrapingBee
WEB_UNLOCKER_URL='https://app.scrapingbee.com/api/v1/?api_key=KEY&render_js=true&url={url}'
# ScraperAPI
WEB_UNLOCKER_URL='https://api.scraperapi.com/?api_key=KEY&render=true&url={url}'
# ZenRows
WEB_UNLOCKER_URL='https://api.zenrows.com/v1/?apikey=KEY&js_render=true&url={url}'
intercept substitutes the (URL-encoded) target for {url} and converts the returned HTML (or JSON wrapping it) to markdown. It runs at tier 3 as a paid last resort after the free fetchers, only when this variable is set — and your credentials in the template are only ever sent to the unlocker, never to the target. Bright Data's proxy-based Web Unlocker is just an authenticated proxy, so use HTTPS_PROXY / INTERCEPT_PROXIES for that instead. This bills per request.
If raw fetches start getting flagged, the most effective fix is usually a clean outbound IP — not a fancier fingerprint. intercept-mcp honors the standard HTTPS_PROXY / HTTP_PROXY / NO_PROXY env vars, so you can route all outbound traffic through whatever proxy you already have:
HTTPS_PROXY=http://user:[email protected]:8080 npx intercept-mcp
This works with any HTTP(S) proxy — a self-hosted Squid, a Tailscale exit node, a $5 VPS running 3proxy, or commercial residential proxies (Bright Data, Oxylabs, etc.). The stealth fetcher and got-scraping calls also pick this up automatically.
A single proxy still presents a single IP, which can itself get flagged under load. Set INTERCEPT_PROXIES to a comma- or space-separated list and intercept-mcp round-robins across them, automatically retrying through the next proxy when a request comes back blocked (HTTP 403, 429, 451, 503) or errors:
INTERCEPT_PROXIES="http://user:[email protected]:8080,http://user:[email protected]:8080,http://p3.example.com:8080" npx intercept-mcp
Requests spread across the list, and a blocked response is retried through a different egress (up to 3 attempts) before giving up — so a handful of cheap proxies, or a rotating residential endpoint listed multiple times, behave like a pool. INTERCEPT_PROXIES takes precedence over HTTPS_PROXY, applies per request (so the stealth and archive.ph got-scraping calls rotate too), and accepts HTTP(S) proxies. Invalid entries are ignored.
Most of the web is behind a login. INTERCEPT_AUTH lets you attach your own headers or cookies to requests for a specific origin, so the fetch tools can read content you're legitimately signed in to — a paid subscription, a private dashboard, an intranet, an authenticated API.
It's a JSON object mapping a domain to a header map. A domain also matches its subdomains:
INTERCEPT_AUTH='{
"nytimes.com": { "Cookie": "nyt-s=...; nyt-a=..." },
"api.acme.com": { "Authorization": "Bearer eyJ..." }
}' npx intercept-mcp
To get a cookie: open the site logged-in, open DevTools → Network, copy the Cookie request header from any request to that domain.
INTERCEPT_AUTH entry, intercept does not read from or write to the public agentsweb.org cache for that URL — so your private/paid content is never published, and you always get your authenticated view rather than a stranger's anonymous copy. (The in-process session cache still applies.)For reliable search, self-host SearXNG with Docker. A config is included in the repo:
git clone https://github.com/bighippoman/intercept-mcp.git
cd intercept-mcp/searxng && docker compose up -d
Then set SEARXNG_URL=http://localhost:8888. No rate limits, no CAPTCHAs, aggregates Google + Bing + DuckDuckGo + Wikipedia + Brave.
Or use any existing SearXNG instance — just set SEARXNG_URL to its URL.
Incoming URLs are automatically cleaned:
ref, format, page, offset, limit)Agents pass URLs taken from untrusted web content, so the fetch tools refuse anything pointing at local or internal infrastructure: loopback and private IPv4/IPv6 ranges, link-local addresses (including the 169.254.169.254 cloud metadata endpoint), CGNAT, multicast/reserved ranges, and local hostnames (localhost, *.local, *.internal, *.home.arpa). Literal IPs are checked, including alternate notations (decimal, hex) normalized by the URL parser; DNS is not resolved, so public hostnames pointing at private IPs are not caught.
Each fetcher result is scored for quality. Automatic fail on:
Run in your terminal:
claude mcp add bighippoman-intercept-mcp --env HTTPS_PROXY="" -- npx pro tip
Just installed bighippoman/intercept-mcp? Say to Claude: "remember why I installed bighippoman/intercept-mcpand what I want to try" — it'll save into your Vault.
how this works →CSA PROJECT - FZCO © 2026 IFZA Business Park, DDP, Premises Number 31174 - 001
Security
Low riskAutomated heuristic from public metadata — not a security guarantee.