Sael vs MCP: we benchmarked the protocol — −89% round-trips
Numbers, not adjectives. We put the Sael reference server head-to-head with a spec-correct MCP server — same tools, same dataset, same transport. Here is what the protocol shape actually costs.
We kept telling people Sael is faster than MCP. Then we got tired of saying it without a number, so we built a benchmark.
The setup (why it is fair)
The trick with protocol benchmarks is removing everything that is not the protocol. So:
- Same tools, same deterministic dataset on both servers.
- Same transport — WebSocket for both. No HTTP/2-vs-WS skew in the byte counts.
- The Sael side is the real reference Go server, not a mock.
- The MCP side is a spec-correct JSON-RPC 2.0 server —
initialize→tools/list→tools/call, standard result envelope. - Counters exclude the connection handshake — we measure the marginal cost of the task.
We count round-trips, raw bytes on the wire, and time-to-first-byte. All structural — they do not depend on the machine.
Result 1 — multi-step chains
A realistic agent pattern: fetch → transform → transform → … where each step depends on the previous one. MCP has no server-side composition, so every dependent step is a separate round-trip that drags the intermediate payload back through the client.
| Metric (8-step chain) | MCP | Sael |
|---|---|---|
| Round-trips | 9 | 1 |
| Bytes on the wire | 139 KB | 8 KB |
That is −89% round-trips and −94% bytes. Sael describes the whole chain in one pipeline frame; the server runs it and returns only the final result.
Result 2 — parallel fan-out
Six independent enrichment branches over the same data, 50 ms each. MCP runs them sequentially. Sael runs them concurrently inside one request.
| Metric | MCP | Sael |
|---|---|---|
| Round-trips | 7 | 1 |
| Wall-clock | 312 ms | 53 ms |
Wall-clock is the slowest branch, not the sum — ~6× faster, and the branches never touch the client.
Result 3 — delta streams
A dashboard of 30 metrics, 60 updates. The naive approach re-sends the whole state on every tick. Sael streams a snapshot once, then merge-patches (RFC 7386).
| Metric | Full resend | Delta stream |
|---|---|---|
| Bandwidth | 28.1 KB | 3.6 KB |
−87% bandwidth. The cost tracks the change, not the size of the state — exactly what real-time dashboards need.
Result 4 — streaming latency and MessagePack
- Native streaming delivers the first item ~10× sooner (1012 ms → 101 ms for a 10-item stream) — MCP blocks until the whole batch is ready.
- Opt-in MessagePack binary frames shave ~12% off the wire on a typical payload (more on numeric-heavy data).
The honest caveats
These measure protocol overhead — round-trips, bytes, latency-to-first-byte — not LLM tokens or real network RTT. The win scales with agent-loop depth and payload size; a single tool call sees no round-trip advantage. On a real network (20–100 ms RTT per hop), collapsing N round-trips into one compounds the latency win further — here it is hidden because both servers ran locally.
The whole harness is reproducible. The protocol is source-available (BUSL-1.1). See unyly.org/sael for the spec and a live demo.