loading…
Search for a command to run...
loading…
The MCP for the Web Speed Agent SDK that enables post-auth agents.
The MCP for the Web Speed Agent SDK that enables post-auth agents.
PyPI version Python License: GPL v3
Local browser automation + Web Speed API integration for authenticated web extraction.
Point an AI agent at any website — including ones that require login — and get back clean, structured data. Credentials stay on your machine. Only extracted HTML goes to the server.
pip install web-speed-agent
playwright install chromium
Want to use this with Claude, Gemini, or other AI clients?
Check out the MCP Server Installation Guide — it's the easiest way to let AI agents log in and extract data through natural language.
Your machine Web Speed server
───────────────────────────────── ──────────────────────────
Playwright browser (local)
↓ navigates, logs in, clicks
↓ gets page HTML
↓ (no passwords sent)
agent.extract(html) ────────→ Advanced extraction engine
←──────── Structured JSON
Credentials never leave your machine. The server only sees HTML.
import asyncio
from web_speed_agent import Agent
async def main():
agent = Agent(api_key="wsp_...") # or set WEBSPEED_API_KEY env var
# Public pages — no browser needed
result = await agent.map("https://techcrunch.com/some-article/")
print(result["article"]["sections"])
# Authenticated pages — browser runs locally
agent.store_credential("mysite", "[email protected]", "mypassword")
async with agent.browser(session_name="mysite") as browser:
page = await browser.new_page()
await page.goto("https://mysite.com/login")
username, password = agent.get_credential("mysite")
await page.fill('[name="email"]', username)
await page.fill('[name="password"]', password)
await page.click('button[type="submit"]')
await page.wait_for_load_state("networkidle")
# Now on a logged-in page — extract it
html = await page.content()
result = await agent.extract(html, page_type="listing")
print(result["listing"]["items"])
asyncio.run(main())
Get an API key at getwebspeed.io.
Requirements: Python 3.10+, a Web Speed API key
pip install web-speed-agent
playwright install chromium
export WEBSPEED_API_KEY="wsp_..."
The main class. Manages credentials, browser sessions, and API calls.
from web_speed_agent import Agent
# API key from argument
agent = Agent(api_key="wsp_...")
# API key from environment variable (recommended)
# export WEBSPEED_API_KEY="wsp_..."
agent = Agent()
# Use as async context manager (auto-closes HTTP client)
async with Agent() as agent:
...
No browser needed for pages that don't require login:
# Fetch + extract in one call
result = await agent.map("https://example.com/article")
# With JavaScript rendering (for heavy SPAs)
result = await agent.map("https://example.com/spa", js=True)
Use a local browser session. The browser runs on your machine:
async with agent.browser(session_name="mysite") as browser:
page = await browser.new_page()
await page.goto("https://mysite.com/dashboard")
html = await page.content()
result = await agent.extract(html)
The session_name persists cookies to ~/.webspeed/sessions/<name>/ so subsequent runs skip the login step.
Credentials are stored in your system keychain (macOS Keychain, Windows Credential Manager, Linux secret-tool). They are never sent to Web Speed servers.
# Store once
agent.store_credential("mysite", "[email protected]", "mypassword")
# Retrieve anywhere
username, password = agent.get_credential("mysite")
# Remove
agent.delete_credential("mysite")
The server returns page-type-aware structured data:
# Article
result = await agent.extract(html, page_type="article")
# result["page_type"] → "article"
# result["title"] → "Article Title"
# result["author"] → "Jane Smith"
# result["published_date"] → "2026-05-06"
# result["article"]["sections"] → [{"heading": "...", "paragraphs": [...]}]
# result["article"]["links"] → [{"text": "...", "url": "..."}]
# Product
result = await agent.extract(html, page_type="product")
# result["product"]["name"] → "Wireless Headphones"
# result["product"]["price"] → "$99.99"
# result["product"]["availability"] → "In Stock"
# result["product"]["rating"] → "4.5"
# result["product"]["specs"] → {"Battery": "30h", ...}
# Listing (search results, category pages)
result = await agent.extract(html, page_type="listing")
# result["listing"]["items"] → [{"title": "...", "url": "...", "price": "..."}]
# Auto-detect (default)
result = await agent.extract(html)
# result["page_type"] → "article" | "product" | "listing" | "other"
All results include engine: "advanced" — 60–85% more token-efficient than raw HTML.
import asyncio
from web_speed_agent import Agent
async def check_price(url: str, site_name: str) -> str:
async with Agent() as agent:
agent.store_credential(site_name, "[email protected]", "password", overwrite=True)
async with agent.browser(session_name=site_name) as browser:
page = await browser.new_page()
# Login
await page.goto(f"https://{site_name}.com/login")
user, pwd = agent.get_credential(site_name)
await page.fill('[name="email"]', user)
await page.fill('[name="password"]', pwd)
await page.click('button[type="submit"]')
await page.wait_for_load_state("networkidle")
# Check product
await page.goto(url)
await page.wait_for_load_state("networkidle")
html = await page.content()
result = await agent.extract(html, page_type="product")
return result.get("product", {}).get("price", "unknown")
price = asyncio.run(check_price("https://example.com/product/123", "example"))
print(f"Current price: {price}")
import asyncio
from web_speed_agent import Agent
async def get_dashboard_data():
async with Agent() as agent:
async with agent.browser(session_name="analytics") as browser:
page = await browser.new_page()
# Login (first run only — session persists after)
creds = agent.get_credential("analytics")
if not creds:
agent.store_credential("analytics", "[email protected]", "password")
creds = agent.get_credential("analytics")
await page.goto("https://analytics.company.com/login")
await page.fill('[name="email"]', creds[0])
await page.fill('[name="password"]', creds[1])
await page.click('button[type="submit"]')
await page.wait_for_load_state("networkidle")
# Navigate to dashboard
await page.goto("https://analytics.company.com/dashboard")
await page.wait_for_selector(".metrics-table", timeout=10000)
html = await page.content()
result = await agent.extract(html)
return result
asyncio.run(get_dashboard_data())
import asyncio
from web_speed_agent import Agent
async def scrape_inbox():
async with Agent() as agent:
async with agent.browser(session_name="webmail") as browser:
page = await browser.new_page()
# Login
await page.goto("https://mail.example.com/login")
user, pwd = agent.get_credential("webmail")
await page.fill('[name="username"]', user)
await page.fill('[name="password"]', pwd)
await page.click('[type="submit"]')
await page.wait_for_load_state("networkidle")
# Scrape multiple pages
emails = []
for page_num in range(1, 4):
await page.goto(f"https://mail.example.com/inbox?page={page_num}")
await page.wait_for_load_state("networkidle")
html = await page.content()
result = await agent.extract(html, page_type="listing")
emails.extend(result.get("listing", {}).get("items", []))
return emails
asyncio.run(scrape_inbox())
The included MCP server lets Claude Desktop, Gemini CLI, and any MCP-compatible agent use the SDK directly. The agent can log in, navigate, click, and extract — all through natural language.
Start the MCP server:
WEBSPEED_API_KEY="wsp_..." python3 agent_mcp_server.py
Add to Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"web-speed-agent": {
"command": "python3",
"args": ["/path/to/agent_mcp_server.py"],
"env": {
"WEBSPEED_API_KEY": "wsp_..."
}
}
}
}
Add to Gemini CLI (~/.gemini/settings.json):
{
"mcpServers": {
"web-speed-agent": {
"command": "python3.11",
"args": ["/path/to/agent_mcp_server.py"],
"env": {
"WEBSPEED_API_KEY": "wsp_...",
"PYTHONPATH": "/path/to/web-speed-agent"
}
}
}
}
Then tell the agent:
"Store my credentials for united — username [email protected], password mypassword"
"Log into united.com and find me the cheapest flight from SFO to JFK next Friday"
Available MCP tools:
| Tool | Description |
|---|---|
store_credential |
Save login to system keychain |
login |
Open browser + sign in |
navigate |
Go to a URL in the active session |
extract_page |
Get structured data from current page |
click |
Click a button or link |
fill_field |
Type into a form field |
submit_form |
Submit a form |
close_browser |
End the browser session |
account_info |
Check API credit balance |
AgentAgent(
api_key: str | None = None,
server_url: str | None = None,
config_dir: str = "~/.webspeed",
headless: bool = True,
)
| Parameter | Description |
|---|---|
api_key |
Web Speed API key. Falls back to WEBSPEED_API_KEY env var. |
server_url |
Override API server URL. Default: https://api.getwebspeed.io. |
config_dir |
Directory for config, sessions, and logs. Default: ~/.webspeed. |
headless |
Run browser headlessly. Default: True. |
agent.browser()agent.browser(
session_name: str | None = None,
headless: bool | None = None,
proxy: str | None = None,
) -> ManagedBrowser
Returns an async context manager. Inside the block, call .new_page() to get a Playwright Page.
| Parameter | Description |
|---|---|
session_name |
Persist cookies to ~/.webspeed/sessions/<name>/. None = no persistence. |
headless |
Override instance headless for this session. |
proxy |
Proxy URL e.g. "socks5://localhost:1080". |
Session names must be alphanumeric + hyphens/underscores, max 64 chars.
agent.extract()await agent.extract(
html: str,
page_type: str = "auto",
) -> dict
Sends HTML to the Web Speed API. Costs 1 credit.
| Parameter | Description |
|---|---|
html |
Raw HTML string (e.g. from page.content()). |
page_type |
"article", "product", "listing", or "auto". |
agent.map()await agent.map(
url: str,
js: bool = False,
) -> dict
Fetches and extracts a public URL via the server. No local browser needed. Costs 1 credit.
| Parameter | Description |
|---|---|
url |
Page URL. Must be http:// or https://. |
js |
Render JavaScript before extracting. |
agent.account()await agent.account() -> dict
Returns: credits, tier, status, lifetime (total/hits/misses).
agent.store_credential()agent.store_credential(
site: str,
username: str,
password: str,
overwrite: bool = False,
) -> None
Saves to system keychain. Raises CredentialError if credential exists and overwrite=False.
agent.get_credential()agent.get_credential(site: str) -> tuple[str, str] | None
Returns (username, password) or None if not found.
agent.delete_credential()agent.delete_credential(site: str) -> None
Removes credential from keychain.
from web_speed_agent import (
WebSpeedError, # Base exception
AuthenticationError, # Invalid/missing API key
InsufficientCreditsError, # No credits remaining
APIError, # API returned 4xx/5xx
RateLimitError, # 429 Too Many Requests
CredentialError, # Keychain error
BrowserError, # Playwright error
NetworkError, # Timeout or DNS failure
PlaywrightNotInstalledError, # Run: playwright install chromium
)
from web_speed_agent import Agent, InsufficientCreditsError, NetworkError
try:
result = await agent.extract(html)
except InsufficientCreditsError:
print("Out of credits — top up at getwebspeed.io")
except NetworkError as e:
print(f"Connection failed: {e}")
| Variable | Description |
|---|---|
WEBSPEED_API_KEY |
API key (recommended over config file) |
WEBSPEED_SERVER_URL |
Override server URL (must be https://) |
~/.webspeed/config.yaml — created automatically on first run. Permissions set to 0o600 (owner-only).
api:
server_url: https://api.getwebspeed.io
timeout: 30
browser:
headless: true
Persisted browser sessions are stored in ~/.webspeed/sessions/<name>/storage.json.
0o600 (owner-only)When you call agent.extract(html), the page HTML is sent to the Web Speed API for processing. Everything else stays local.
| Data | Where it goes |
|---|---|
| Login credentials | Never leave your machine (system keychain only) |
| Browser cookies / session | Never leave your machine (local Playwright) |
| Page HTML | Sent over HTTPS to Web Speed API for extraction |
| Extracted JSON | Returned to you |
Before any HTML is transmitted, the SDK automatically scrubs it locally:
<script> and <style> blocks removedcsrf, token, nonce, session, etc.) have their values blanked<meta> content attributes clearedVisible content — text, links, tables, headings, product data — is untouched.
# Default: scrubbing is on
result = await agent.extract(html)
# Turn off only if the page has no sensitive data
result = await agent.extract(html, scrub=False)
# Or scrub manually and inspect before sending
from web_speed_agent import scrub
clean_html = scrub(raw_html)
print(clean_html) # inspect what will be sent
result = await agent.extract(clean_html, scrub=False)
"sdk-extract"), timestamp, and detected page type — no content0o600 permissions (owner-only read/write)0o700 permissionsverify=True on all HTTP calls, cannot be disabledserver_url must start with https://, plain HTTP rejected[a-zA-Z0-9_-] allowlistGNU General Public License v3.0 — see LICENSE.
Web Speed API usage is subject to the Web Speed Terms of Service.
Add this to claude_desktop_config.json and restart Claude Desktop.
{
"mcpServers": {
"web-speed-agent": {
"command": "npx",
"args": []
}
}
}