loading…
Search for a command to run...
loading…
A semantic browser runtime for AI agents that replaces raw HTML with structured data and dynamic, page-specific tools. It features built-in site memory and auto
A semantic browser runtime for AI agents that replaces raw HTML with structured data and dynamic, page-specific tools. It features built-in site memory and automated bot detection bypass to enable efficient, self-healing web automation.
A real, visible-cursor browser for AI agents. Real Chromium. Real humanlike physics. Real audit trail. Agents drive it the way a human would: drag, click, type, scroll - while every action is recorded, replayable, and verified.
Works with every major LLM: Anthropic Claude, OpenAI GPT, Google Gemini, Groq, Together, Fireworks, DeepInfra, Mistral, Cohere, xAI Grok, OpenRouter, Perplexity, Ollama (local), Ollama Cloud (hosted), vLLM, LM Studio, llama.cpp - or anything OpenAI-compatible. Zero lock-in.
┌──────────────────────────────────────────────────────────────────┐
│ Agent (your code) │
│ "submit the payment form" ──┐ │
└────────────────────────────────┼─────────────────────────────────┘
│ HTTP POST /sessions/:id/plan
▼
┌──────────────────────────────────────────────────────────────────┐
│ AgentBrowser runtime │
│ Planner ──▶ findAndClick (DOM ▶ vision-LLM) ──▶ cursor │
│ │ │ │ │
│ │ ▼ ▼ │
│ │ verifier (DOM diff) Bezier trajectory │
│ │ │ + CDP raw events │
│ ▼ ▼ │ │
│ action action.completed event ──▶ recorder (JSONL) │
│ memory │ │
│ (skip LLM ▼ │
│ on visit WebSocket / SSE │
│ #2+) to operator UI │
└──────────────────────────────────────────────────────────────────┘
│
▼
Real visible cursor moves on real Chromium
Every existing browser-automation tool was built for humans first and retrofitted for agents. They speak DOM operations. They produce 8000-token HTML dumps. They re-learn each site every run. They have no audit trail. They get blocked by every cookie banner.
AgentBrowser inverts this. The cursor is real and visible. All input goes through CDP raw mouse events. The API speaks in goals, not selectors. Failed actions auto-recover. Every action gets verified. The system learns each site permanently and shares knowledge across domains.
Other tools │ AgentBrowser
─────────────────────────────────│──────────────────────────────────
8000 tokens of HTML │ 50 tokens of structured meaning
agent guesses #submit-btn-v2 │ { goal: "submit the form" }
no replay, no audit │ JSONL trace, deterministic replay
re-learns every visit │ action memory, 7x faster on visit 3
blocked by every cookie wall │ auto-dismiss + force-removal
no captcha story │ 2Captcha / hCaptcha / Turnstile
no fingerprint defenses │ per-context WebGL/canvas/audio noise
single integration: library │ library + MCP + HTTP + WS + SSE + replay
locked to one LLM vendor │ 17 providers, one env var swap (Claude/GPT/
│ Gemini/Ollama/vLLM/Groq/Together/Fireworks/...)
git clone https://github.com/AshtonVaughan/agentbrowser
cd agentbrowser
npm install && npx playwright install chromium && npm run build
# Pick ANY LLM provider:
ANTHROPIC_API_KEY=sk-ant-... npm run http # Claude (default)
# or
OPENAI_API_KEY=sk-... npm run http # GPT
# or
GOOGLE_API_KEY=... npm run http # Gemini
# or
GROQ_API_KEY=gsk_... npm run http # Groq (Llama on LPUs)
# or run fully local with Ollama:
ollama serve &
AGENTBROWSER_LLM_PROVIDER=ollama OLLAMA_MODEL=llama3.2-vision npm run http
# or use Ollama Cloud:
OLLAMA_CLOUD_API_KEY=... npm run http
In another terminal:
# Create a session
curl -X POST localhost:3100/api/v1/sessions
# → { "session_id": "abc123..." }
# Plan + execute a goal end-to-end
curl -X POST localhost:3100/api/v1/sessions/abc123/plan \
-H 'content-type: application/json' \
-d '{"goal":"go to news.ycombinator.com and click the top story"}'
# → { success: true, steps: [...], duration_ms: 4200 }
# Watch it live
open ui/operator/index.html
The agent's cursor moves humanly across the screen. Every cursor.move, click, page change streams to the operator UI in real time. Every action is recorded to ~/.agentbrowser/traces/ for replay.
Input.dispatchMouseEvent (not Playwright locators)| Layer | What it does |
|---|---|
cursor.click(x, y) |
Direct viewport click via CDP |
cursor.clickBySelector(sel) |
Bbox-resolve, scroll-into-view, humanlike click. Stale-element auto-recovery via accessible-name lookup. |
cursor.clickByText(text) |
Text disambiguation across visually similar elements |
cursor.clickByRole(role, {name}) |
ARIA-driven targeting |
findAndClick({goal, ...}) |
DOM selector → text → role → vision-LLM, every step verified |
executor.executeAction(name) |
Action-memory fast path → fallback to find-and-click |
planner.planAndExecute(goal) |
LLM goal decomposition → multi-step run with retry budget |
extractElementBoxes(page) returns rich element catalog: id / role / tag / accessible name / value / bbox / selector / disabledbboxScreenshot(sessionId) returns a viewport PNG with numbered cyan boxes drawn on every interactive target + the element listVisionLLM.decide(goal, screenshot, elements) sends to Claude Sonnet, parses {element_id, action, rationale}cursor.clickByBox(bbox) clicks vision-derived coordinates with the visible cursorcursor.clickBySelector falls back to getByText(originalText) on selector failureActionMemory - SHA-1 page signature × goal hash → selector × success/fail counters. Visit #2 to a known page costs zero LLM calls.recallByGoal(goal, excludeDomain) returns winning selectors from OTHER domains for the same logical goal. The system learned "submit payment → button#pay-btn" on stripe.com; it tries the same selector on paddle.com as a hypothesis.decay(unusedSinceMs) halves stale entry counts so the library stays healthy as sites change.~/.agentbrowser/traces/<session-id>.jsonlReplayEngine reads a trace, dispatches events to a fresh session at configurable speedcompactTrace() collapses 60-event cursor.move trajectories into 1, merges consecutive cursor.type events, drops micro-waits$email, $password)SkillLibrary.parameterize(events, slots) replaces literal values with tokens (longest-first to avoid partial-match bugs)~/.agentbrowser/skills/TwoCaptchaSolver for hCaptcha + reCAPTCHA v2 + Cloudflare Turnstile via 2Captcha APIDETECT_CAPTCHA_SCRIPT finds sitekeys for all three typessolveCaptchaIfPresent(page, solver) chain: detect → solve → inject token → fire change/input events → invoke data-callbackCaptchaSolver interface (drop in AntiCaptcha, CapMonster, etc.)applyFingerprintShield(context) per-context init scriptengine.newTab(sessionId, url?) opens a tab in the same context (shares cookies/auth)HumanCursor. switchTab / closeTab / listTabs./sessions/:id/tabs, POST /tabs/:tab/switchLLMProvider interface (complete() + completeWithImage()); analyzer + vision-LLM + planner all use this abstraction, never SDKs| Provider | Set | Notes |
|---|---|---|
| Anthropic Claude | ANTHROPIC_API_KEY |
default if nothing else set |
| OpenAI | OPENAI_API_KEY |
gpt-4o-mini default |
| Google Gemini | GOOGLE_API_KEY or GEMINI_API_KEY |
gemini-2.5-flash, vision native |
| Groq | GROQ_API_KEY |
super-fast Llama/Mixtral on LPUs |
| Together AI | TOGETHER_API_KEY |
open-source models |
| Fireworks | FIREWORKS_API_KEY |
open-source models |
| DeepInfra | DEEPINFRA_API_KEY |
open-source models |
| Mistral | MISTRAL_API_KEY |
mistral-large-latest |
| Cohere | COHERE_API_KEY |
command-r-plus via /compatibility |
| xAI Grok | XAI_API_KEY |
grok-2 with vision |
| OpenRouter | OPENROUTER_API_KEY |
300+ models behind one API |
| Perplexity | PERPLEXITY_API_KEY |
online-search models |
| Azure OpenAI | AZURE_OPENAI_API_KEY + AZURE_OPENAI_BASE_URL |
enterprise tenant |
| Ollama (local) | OLLAMA_BASE_URL (default localhost:11434) |
llama3.2, llama3.2-vision, qwen2.5vl, etc |
| Ollama Cloud | OLLAMA_CLOUD_API_KEY |
hosted Ollama with turbo models |
| vLLM | VLLM_BASE_URL (default localhost:8000) |
self-hosted production inference |
| LM Studio | LMSTUDIO_BASE_URL (default localhost:1234) |
desktop GUI |
| llama.cpp server | LLAMACPP_BASE_URL (default localhost:8080) |
tiny self-hosted |
| Anything OpenAI-compatible | presets.openaiCompatible(url) |
drop in your URL |
// Pick a provider explicitly (any of the 17):
import { AgentBrowserHttpServer, presets } from 'agentbrowser';
const server = new AgentBrowserHttpServer({
llm_provider: presets.ollamaCloud(), // or .openai() or .groq() etc
// ...
});
Or just set AGENTBROWSER_LLM_PROVIDER=ollama and the server auto-wires. Override the model with <PROVIDER>_MODEL=<model-id>.
extensions/chrome/ in dev mode (chrome://extensions → Load unpacked)POST /api/v1/agents/<your-id>/cmd drives YOUR Chrome with YOUR cookies and login statechrome.debugger for real CDP mouse events + chrome.scripting for vision/extractextensions/chrome/README.md for the full security modelui/operator/index.html - live screenshot + cursor trail overlay + event timeline + quick actions panelui/recorder/index.html - real-time WebSocket event stream + multi-lane timeline canvas + replay scrubber + JSONL exportui/memory/index.html - paginated action memory browser per domain + decay control + JSON exportdocs/pricing.html - 4-tier pricing page wired to /api/v1/billing/checkout for self-serve Stripe paymentsdocs/skills.html - skills marketplace landing with 8 curated skills (login-stripe, login-google, amazon-add-to-cart, github-create-issue, etc.)findCaptionTracks(page) detects HTML5 <track> + YouTube playerCaptionsTracklistRenderer + custom player markupparseVTT(text) / parseJSON3(json) convert standard caption formats to typed TranscriptSegment[]transcribeFromCaptions(page) one-shot: detect → fetch → parsetranscriptToText(segments) concatenate for LLM consumption/api/v1/openapi.json/metrics + readiness probe + dashboard summary endpoint┌─────────────────────────────────────────────────────────────────┐
│ Operator UI (browser-based dashboard) │
│ - live screenshot stream - cursor trail viz │
│ - action log + reasoning - manual takeover │
│ Recorder UI │
│ - timeline scrubber - replay export │
└─────────────────────────────────────────────────────────────────┘
▲
│ WebSocket events + SSE frames + REST
┌─────────────────────────┴───────────────────────────────────────┐
│ AgentBrowser HTTP Control Plane │
│ ┌──────────────────┐ ┌──────────────────┐ ┌────────────────┐ │
│ │ REST + WS + SSE │ │ Bearer auth + │ │ License + │ │
│ │ Fastify │ │ per-key rate │ │ quota system │ │
│ └──────────────────┘ └──────────────────┘ └────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
▲
│
┌─────────────────────────┴───────────────────────────────────────┐
│ AgentBrowser Core Runtime │
│ ┌──────────────────┐ ┌──────────────────┐ ┌────────────────┐ │
│ │ Planner │ │ findAndClick │ │ Recorder + │ │
│ │ (goal → steps) │ │ hybrid action │ │ Replay engine │ │
│ └──────────────────┘ └──────────────────┘ └────────────────┘ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌────────────────┐ │
│ │ Verifier │ │ Modal │ │ Action memory │ │
│ │ (diff snapshots)│ │ interrupter │ │ (skip LLM) │ │
│ └──────────────────┘ └──────────────────┘ └────────────────┘ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌────────────────┐ │
│ │ Vision pipeline │ │ HumanCursor │ │ Site memory │ │
│ │ bbox + annotate │ │ Bezier + CDP │ │ WAL SQLite │ │
│ │ + VisionLLM │ │ trail + ripple │ │ + selectors │ │
│ └──────────────────┘ └──────────────────┘ └────────────────┘ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌────────────────┐ │
│ │ Anti-fingerprint│ │ Captcha solver │ │ LLM provider │ │
│ │ (canvas/WebGL) │ │ + auto-inject │ │ (Anthropic / │ │
│ │ │ │ │ │ OpenAI / etc)│ │
│ └──────────────────┘ └──────────────────┘ └────────────────┘ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Browser Engine - Playwright + stealth + multi-tab │ │
│ │ ↓ │ │
│ │ Chromium (the real browser, with the visible cursor) │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
import { AgentBrowser } from 'agentbrowser';
const browser = new AgentBrowser({
anthropic_api_key: process.env.ANTHROPIC_API_KEY,
headless: false, // watch the cursor move
stealth: true,
});
await browser.launch();
const state = await browser.navigate('https://news.ycombinator.com');
console.log(state.page_type); // 'listing'
console.log(state.available_actions); // [{ name: 'navigate_to_new', ... }, ...]
await browser.action('navigate_to_new');
const data = await browser.extract({
top_story: 'title of the top story',
points: 'upvote count',
author: 'submitter username',
});
await browser.close();
curl -X POST http://localhost:3100/api/v1/sessions
curl -X POST http://localhost:3100/api/v1/sessions/$ID/navigate -d '{"url":"https://example.com"}'
curl -X POST http://localhost:3100/api/v1/sessions/$ID/find_and_click -d '{"goal":"submit the form"}'
curl http://localhost:3100/api/v1/sessions/$ID/screenshot/bbox
from agentbrowser import AgentBrowserClient
client = AgentBrowserClient("http://localhost:3100", api_key="...")
with client.create_session() as s:
s.navigate("https://example.com")
s.click(selector="a")
png, elements = s.annotated_screenshot()
Add to ~/.claude.json:
{
"mcpServers": {
"agentbrowser": {
"command": "node",
"args": ["/path/to/agentbrowser/dist/server/mcp.js"],
"env": { "ANTHROPIC_API_KEY": "sk-..." }
}
}
}
src/
├── engine/ Browser lifecycle, sessions, tabs
│ ├── browser.ts Playwright wrapper, cursor wiring, popup handling
│ └── tabs.ts Multi-tab manager
├── input/ Cursor + physics + fingerprint shield
│ ├── cursor.ts HumanCursor: SVG overlay, CDP input, click/drag/type
│ ├── trajectory.ts Bezier path generator with jitter + overshoot
│ └── fingerprint.ts WebGL/canvas/audio anti-fingerprint patches
├── vision/ Vision pipeline
│ ├── bbox.ts Element catalog extraction
│ ├── annotate.ts Numbered-box screenshot annotator
│ ├── diff.ts Snapshot diff (added/removed/moved/textChanged)
│ └── llm.ts Claude vision integration
├── runtime/ Action execution + autonomy
│ ├── executor.ts Action runner with action-memory hot path
│ ├── find.ts Hybrid DOM-then-vision findAndClick
│ ├── verifier.ts Action verification via snapshot diff
│ ├── modal-interrupter.ts Cookie/consent/popup detection
│ ├── planner.ts LLM goal decomposition + multi-step execution
│ ├── plan-audit.ts Per-step screenshot + element capture
│ ├── recorder.ts JSONL action stream
│ ├── replay.ts Deterministic trace replay
│ ├── compactor.ts Trace compaction (collapse cursor.move runs)
│ ├── events.ts In-process event broker (pub/sub)
│ ├── captcha.ts 2Captcha API integration
│ └── captcha-solver.ts Detect → solve → inject pipeline
├── memory/ Persistent storage
│ ├── store.ts Site memory (page model cache + selector library)
│ └── action-memory.ts Per-(page,goal) selector cache + cross-domain transfer
├── llm/ Provider abstraction
│ ├── provider.ts LLMProvider interface
│ ├── anthropic.ts Anthropic SDK wrapper
│ ├── openai.ts OpenAI-compatible (works with vLLM/Ollama too)
│ └── index.ts autoDetectProvider
├── semantic/ Page analysis
│ └── analyzer.ts Page-to-SemanticPageModel via LLM
├── skills/ Skill library
│ ├── skills.ts Parameterize/save/load/run
│ └── package.ts .skill.json export/import format
├── server/ Production server surfaces
│ ├── http.ts Fastify + REST + WS + SSE + auth + rate limit
│ ├── openapi.ts OpenAPI 3.1 spec
│ ├── license.ts Tier system + quota tracking
│ └── mcp.ts MCP server for Claude Code et al.
├── bin/
│ └── http.ts HTTP server entry point
├── util/
│ └── coords.ts Viewport ↔ document ↔ screenshot coord conversions
├── types.ts Shared types (SemanticPageModel, ActionDefinition, ...)
└── index.ts Public API barrel
tests/ Vitest suite (119 tests, 21 files)
clients/
├── python/ Python SDK (zero deps, stdlib urllib)
└── typescript/ TypeScript SDK (browser + Node compatible)
ui/
├── operator/ Live dashboard (single HTML file)
└── recorder/ Real-time trace timeline (single HTML file)
docs/ Public docs site (single HTML file)
examples/ Working agent demos
.agentbrowser-meta/ Internal build planning + iteration log
Dockerfile Multi-stage slim image (~700MB with Chromium)
docker-compose.yml Local stack with persistent volume
22 HTTP endpoints, all bearer-authenticated when API keys are configured. Full OpenAPI 3.1 at /api/v1/openapi.json.
| Method | Path | Purpose |
|---|---|---|
| GET | /health |
Liveness (no auth) |
| POST | /api/v1/sessions |
Create session |
| DELETE | /api/v1/sessions/:id |
Destroy session |
| POST | /api/v1/sessions/:id/navigate |
Go to URL |
| POST | /api/v1/sessions/:id/cursor/{move,click,drag,scroll,type,press} |
Cursor primitives |
| POST | /api/v1/sessions/:id/find_and_click |
Hybrid DOM → vision-LLM action |
| POST | /api/v1/sessions/:id/plan |
LLM goal decomposition + multi-step execute |
| POST | /api/v1/sessions/:id/extract |
Schema-driven LLM extraction |
| POST | /api/v1/sessions/:id/fill |
Fill a named form |
| GET | /api/v1/sessions/:id/screenshot |
Viewport PNG |
| GET | /api/v1/sessions/:id/screenshot/bbox |
Annotated PNG + element list |
| GET | /api/v1/sessions/:id/screenshot/stream |
SSE PNG frames |
| WS | /api/v1/sessions/:id/screenshot/ws |
Binary PNG frames |
| GET | /api/v1/sessions/:id/state |
Current SemanticPageModel |
| GET | /api/v1/sessions/:id/elements |
ElementBox[] |
| GET/POST/DELETE | /api/v1/sessions/:id/tabs |
Multi-tab control |
| POST | /api/v1/sessions/:id/solve_captcha |
Detect + solve + inject |
| WS | /api/v1/sessions/:id/events |
Live event stream |
| POST | /api/v1/sessions/:id/back |
History back |
| POST | /api/v1/sessions/:id/forward |
History forward |
| POST | /api/v1/sessions/:id/reload |
Reload current page |
| POST | /api/v1/sessions/:id/dialog |
Accept/dismiss next native alert/confirm/prompt |
| GET/POST/DELETE | /api/v1/sessions/:id/cookies |
Read/write/clear browser cookies |
| GET/POST/DELETE | /api/v1/sessions/:id/storage |
Read/write/clear localStorage or sessionStorage (?kind=local|session) |
| POST | /api/v1/sessions/:id/upload |
Set files on a file input ({selector, paths}) |
| POST | /api/v1/sessions/:id/print |
Render current page to PDF (returns application/pdf) |
| POST | /api/v1/sessions/:id/route/block |
Block all requests matching a glob pattern |
| POST | /api/v1/sessions/:id/route/headers |
Inject headers into requests matching a pattern |
| POST | /api/v1/sessions/:id/route/mock |
Mock response body for matching requests |
| DELETE | /api/v1/sessions/:id/route |
Remove all route handlers (passthrough) |
| POST | /api/v1/sessions/:id/geolocation |
Override reported coords (or null to clear) |
| POST | /api/v1/sessions/:id/viewport |
Resize viewport mid-session |
| POST | /api/v1/sessions/:id/headers |
Set extra HTTP headers for all requests |
| POST | /api/v1/sessions/:id/har/start |
Start HAR network capture |
| GET | /api/v1/sessions/:id/har/peek |
Get current entries without stopping |
| POST | /api/v1/sessions/:id/har/stop |
Stop and return all captured entries |
| POST | /api/v1/sessions/:id/console/start|peek|stop |
Capture console messages + uncaught errors |
| POST | /api/v1/sessions/:id/throttle/network |
Throttle network (downloadThroughput/uploadThroughput/latencyMs/offline) |
| POST | /api/v1/sessions/:id/throttle/cpu |
CPU slowdown multiplier (1=native, 4=4x slower) |
| POST | /api/v1/sessions/:id/locale |
Override navigator.language + Accept-Language |
| POST | /api/v1/sessions/:id/timezone |
Override page timezone (e.g. "Asia/Tokyo") |
| POST/DELETE | /api/v1/sessions/:id/permissions |
Grant/clear browser permissions (clipboard, notifications, etc.) |
| POST | /api/v1/sessions/:id/record/start |
Begin in-memory recording for skill creation |
| GET | /api/v1/sessions/:id/record/peek |
Live event count while recording |
| POST | /api/v1/sessions/:id/record/stop |
Stop and (optionally) save events as a skill ({name, slots, description}) |
| POST | /api/v1/sessions/:id/har/replay |
Re-execute HAR entries and compare statuses |
| GET | /api/v1/sessions/:id/service-workers |
List active service workers |
| GET/POST | /api/v1/sessions/:id/snapshot |
Export full session state (cookies + storage + IDB); POST /snapshot/restore to import |
| POST/GET | /api/v1/sessions/:id/downloads/start|stop |
Auto-capture all downloads to a session-tagged dir |
| GET/POST | /api/v1/sessions/:id/clipboard |
Read/write the page's clipboard via navigator.clipboard |
| POST | /api/v1/sessions/:id/wait/selector|text|network-idle|function |
Smart waiters with timeouts |
| GET | /api/v1/sessions/:id/markdown |
Extract clean RAG-friendly markdown from current page |
| GET | /api/v1/skills/:name/versions |
List archived versions of a skill |
| POST | /api/v1/skills/:name/rollback |
Restore a previous version ({version: N}) |
| GET | /api/v1/sessions/:id/activity |
Idle time in ms for a session |
| POST | /api/v1/sessions/:id/touch |
Reset idle counter |
| GET | /api/v1/sessions/expired |
List sessions past idle timeout |
| POST | /api/v1/snapshot/diff |
Diff two snapshots ({a, b}) returns added/removed/changed |
| POST | /api/v1/sessions/:id/click_by_description |
Vision-only click ("the blue Submit button") |
| POST | /api/v1/pool/warmup |
Pre-create N empty contexts for sub-100ms session creation |
| GET | /api/v1/pool/status |
Warm pool size + oldest entry age |
| POST | /api/v1/pool/drain |
Close all warm contexts |
| POST | /api/v1/sessions/:id/copilot/install |
Inject highlight overlay for "AI is here" hints |
| POST | /api/v1/sessions/:id/copilot/highlight |
Highlight a bbox with optional label |
| POST | /api/v1/pool/auto_refill/start|stop |
Background job that keeps pool topped up |
| POST | /api/v1/action_memory/predict |
Predict next action from memory ({url, elements, goal?}) |
| GET | /api/v1/skills/marketplace |
Static skill catalog (slug, tags, quality, runs, success_rate) for GitHub Pages hosting |
| GET | /api/v1/vision/cache/stats |
Vision LLM cache hits/misses + persistent_size if SQLite-backed |
| POST | /api/v1/vision/cache/clear |
Empty in-memory + on-disk vision cache |
| GET | /api/v1/sessions/:id/har/export |
Captured HAR exported as standard HAR 1.2 (Chrome DevTools-importable) |
| POST | /api/v1/vision/cache/prune |
Evict oldest entries beyond cache_max_disk_entries + VACUUM |
| POST | /api/v1/traces/diff |
Diff two trace event arrays (regression testing) |
| POST | /api/v1/sessions/:id/shortcut |
Execute named keyboard shortcut (newTab/copy/find/etc) |
| GET | /api/v1/shortcuts |
List available named shortcuts |
| GET | /api/v1/dump |
Single-call snapshot of all server state |
| POST/GET/DELETE | /api/v1/plans[/:slug] |
Save / list / load / delete reusable Plan blueprints |
| POST | /api/v1/sessions/:id/shortcut/chain |
Execute multiple named shortcuts in sequence |
| GET | /api/v1/metrics/summary |
Per-histogram p50/p95/p99/mean/count percentiles |
| GET/POST/DELETE | /api/v1/rate_limits |
Per-domain RPS limit (token-bucket throttle on navigate) |
| POST | /api/v1/sessions/:id/plans/:slug/run |
Load + execute a saved Plan blueprint |
| POST | /api/v1/skills/:name/to_plan |
Convert a recorded skill into a Plan ({save?, slug?}) |
| GET | /api/v1/action_memory/search?pattern=... |
Search action memory by selector substring |
| POST | /api/v1/parallel/extract |
Spawn N parallel sessions, extract markdown from each URL |
| POST | /api/v1/plans/compose |
Chain N saved plans into a super-plan |
| POST | /api/v1/sessions/:id/form/autofill |
Auto-fill form inputs by name/label/placeholder match |
| POST | /api/v1/skills/diff |
Compare two skills' events ({a, b}) |
| GET/POST/DELETE | /api/v1/webhooks[/:id] |
Subscribe to events, POST to external URL (HMAC-signed when secret set) |
| POST | /api/v1/webhooks/:id/test |
Fire a test delivery to verify connectivity |
| POST | /api/v1/batch/csv |
Process a CSV: navigate per row, extract markdown, return enriched results |
| GET | /api/v1/webhooks/queue |
Pending webhook retry count |
| POST/GET | /api/v1/skills/:name/cost |
Record / read per-skill LLM cost (token usage × rates) |
| GET | /api/v1/skills/cost/leaderboard |
Top-N most expensive skills by total cost |
| POST | /api/v1/skills/validate |
Validate a skill's structure before save ({skill}) |
| POST | /api/v1/skills/:name/auto_tag |
LLM-suggest 3-5 tags from skill description + events |
| POST | /api/v1/traces/render |
Render TraceEvent[] as a self-contained HTML timeline page |
| POST | /api/v1/skills/:name/auto_describe |
LLM-write a one-line description from skill events |
| GET | /api/v1/skills/:name/suggested_selectors |
Cross-skill: selectors that worked for the same goal on other domains |
| GET | /api/v1/action_memory/export.csv |
Download action memory as CSV (?domain=&limit=) |
| POST | /api/v1/action_memory/query |
Composite filter+sort query (domain, goal substr, selector substr, min_success_rate, min_runs, sort_by) |
| GET | /api/v1/sessions/:id/a11y |
Accessibility audit (missing alt/label, heading skips, empty links, missing lang) |
| GET | /api/v1/skills/:name/bundle |
Export skill as .agbpkg (skill + plan + stats + readme) |
| POST | /api/v1/skills/bundle/import |
Import an .agbpkg bundle |
| GET | /api/v1/analytics/domain/:domain |
Per-domain action memory analytics + top selectors |
| GET/POST/DELETE | /api/v1/schedules[/:id] |
Recurring skill execution ({skill_name, spec: "every 5m", bindings}) |
| POST | /api/v1/skills/recommend |
Recommend skills matching a goal text ({goal, limit?, min_score?}) |
| POST | /api/v1/plan_templates/:name/to_skill |
Convert a built-in plan template into a runnable skill |
| POST/GET | /api/v1/sessions/:id/network/start|peek|stop |
Per-session network bytes tracking |
| GET | /api/v1/sessions/:id/contrast |
WCAG color contrast audit |
| GET | /api/v1/fingerprints |
List browser fingerprint presets (mac-chrome, iphone-15-pro, tokyo-iphone, etc.) |
| POST | /api/v1/sessions/:id/fingerprint |
Apply a preset ({preset_id}) - viewport + UA + locale + timezone |
| GET | /api/v1/sessions/:id/memory |
Per-session heap/rss/external delta from session creation |
| POST | /api/v1/sessions/:id/memory/snapshot |
Re-baseline the session memory snapshot |
| GET | /api/v1/health/full |
Detailed system health: process, engine, scheduler, billing, auth posture |
| GET | /api/v1/sessions/:id/cpu |
Per-session CPU delta (user/system/wall ms + cpu_percent) |
| POST | /api/v1/sessions/:id/cpu/snapshot |
Re-baseline the session CPU snapshot |
| POST | /api/v1/action_memory/distill |
Top selectors across distinct domains (ship as starter packs) |
| POST | /api/v1/action_memory/import_patterns |
Seed memory with distilled patterns from another deployment |
| POST | /api/v1/sessions/:id/skills/auto_run |
Match goal to best skill and execute (no LLM round-trip) |
| GET/POST/DELETE | /api/v1/skills/ab[/:key] |
Register weighted A/B routes between skill versions |
| POST | /api/v1/sessions/:id/skills/ab/:key/run |
Run an A/B-routed skill (weighted variant pick) |
| WS | /api/v1/skills/events/ws |
Live skill outcome firehose (?skill=<name> filter) |
| GET/DELETE | /api/v1/skills/ab/:key/stats |
Aggregated success/failure stats per variant |
| POST | /api/v1/skills/ab/:key/promote |
Auto-promote winning variant (z-test gated) |
| GET/DELETE | /api/v1/skills/percentiles[?skill=] |
Per-skill p50/p95/p99 latency histograms |
| GET/POST/DELETE | /api/v1/skills |
Skill library CRUD |
| GET | /api/v1/skills/:name/export |
Download .skill.json |
| POST | /api/v1/skills/import |
Import .skill.json |
| POST | /api/v1/sessions/:id/skills/:name/run |
Replay skill with bindings |
| GET | /api/v1/action_memory/stats |
Memory stats |
| GET | /api/v1/action_memory/by_domain/:domain |
What does the system know about :domain |
| GET | /api/v1/action_memory/selectors/:domain |
Top selectors per domain with success/fail stats |
| POST | /api/v1/action_memory/recall_by_goal |
Cross-domain selector hypotheses |
| POST | /api/v1/action_memory/similar |
TF-IDF / embedding action memory search |
| POST | /api/v1/action_memory/similar_embedded |
Force embedding-only search |
| POST | /api/v1/action_memory/decay |
Halve stale entry counts |
| POST | /api/v1/page_similarity |
Score similarity between two page snapshots |
| POST | /api/v1/skills/discover |
Suggest skills relevant for the current page |
| POST | /api/v1/skills/compose |
Run multiple skills in sequence |
| GET | /api/v1/skills/stats |
Per-skill success/fail history (filter by ?skill= for per-domain rows) |
| GET | /api/v1/skills/leaderboard |
Top-N performing skills by success_count |
| GET | /api/v1/skills/hot |
High-confidence skills (success_rate ≥ 0.9, ≥ 10 runs by default) |
| POST | /api/v1/skills/prune |
Remove skills below a success threshold (dry-run by default) |
| DELETE | /api/v1/skills/:name |
Delete a skill by name |
| GET | /api/v1/plan_templates |
List built-in plan templates |
| GET | /api/v1/diagnose |
Remote health diagnostic (mirrors agb-doctor) |
| POST | /api/v1/traces/compact |
Compact a trace file |
| GET | /api/v1/openapi.json |
Full OpenAPI 3.1 spec |
| GET | /api/v1/dashboard |
Aggregated stats: site memory + action memory + skills + provider |
| GET | /api/v1/billing/pricing |
Tier list with prices + limits (powers pricing page) |
| POST | /api/v1/billing/checkout |
Stripe Checkout Session for {tier, email} |
| POST | /api/v1/billing/webhook |
Stripe webhook receiver (signature-verified) |
| GET | /metrics |
Prometheus scrape (no auth) |
| GET | /ready |
Readiness probe (503 until engine launched) |
| GET | /api/v1/agents |
List currently-connected Chrome extension agents |
| GET | /api/v1/agents/:id/poll |
Long-poll for next command (used by Chrome extension) |
| POST | /api/v1/agents/:id/cmd |
Send a command to a connected Chrome extension |
| POST | /api/v1/agents/:id/result |
Extension posts back command results |
| DELETE | /api/v1/agents/:id |
Drop agent state |
docker compose up -d
# AgentBrowser on http://localhost:3100
# Memory + traces persist in named volume
AGENTBROWSER_API_KEYS=key1,key2 \
ANTHROPIC_API_KEY=sk-ant-... \
node dist/bin/http.js
Rate limit defaults to 600 req/min/key. Override with AGENTBROWSER_RATE_LIMIT_PER_MINUTE.
SessionEvent to the in-process broker/api/v1/sessions/:id/events (with 200-event replay buffer for reconnect)~/.agentbrowser/traces/ are the audit logcompactTrace() keeps long traces small without losing fidelity| Capability | Status |
|---|---|
Type check (tsc) |
clean |
| Test suite | 119 / 119 passing across 21 files |
Build (npm run build) |
clean |
| Real Chromium navigation | verified |
| Bot detection bypass | passes Cloudflare interstitial, OneTrust, Cookiebot, Funding Choices, Reddit GDPR, Stack Overflow signup wall |
| Cursor click → DOM event | verified end-to-end |
| Action verifier | verified on URL change + element add/remove/text change |
| Vision pipeline | annotated PNG verified by magic-byte + element list |
| HTTP API | 13 integration tests against full Fastify stack |
| Replay determinism | trajectory generator deterministic per seed |
MIT. Commercial use encouraged.
For acquisition or partnership inquiries: [email protected].
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"agentbrowser": {
"command": "npx",
"args": []
}
}
}