SynapServe HTTP Parser Performance
Inside synapserve-http-parser: a zero-allocation, span-based HTTP/1.1 parser with SIMD-accelerated scanning. How it works, why it's different, and benchmark results against httparse.
The HTTP parser is the first code that touches every byte of every request. In a server designed for AI agent traffic at scale, parse latency directly determines maximum throughput. We built synapserve-http-parser from scratch to eliminate every allocation, every copy, and every branch we could find.
The core idea: spans, not strings
Traditional HTTP parsers allocate heap memory for every parsed field — the method, URI, each header name, each header value. A typical request with 10 headers triggers 20+ allocations before your handler sees a single byte.
synapserve-http-parser never allocates. Instead of copying bytes into owned strings, it records where each field lives in the original buffer using a 4-byte span:
// 4 bytes. Copy. No heap. No lifetime gymnastics.
pub struct Span {
off: u16, // offset into the request buffer
len: u16, // length in bytes
}
// To read the actual bytes, borrow the original buffer:
let host = req.host.as_bytes(buf); // &[u8], zero-copy
Every parsed field — method, URI, version, header names, header values — is a Span. The entire parse output lives on the stack in a fixed-size structure:
per parsed field. A Span is Copy, fits in a register, never touches the heap.
total stack footprint. 64 headers × 10 bytes each. The entire header table lives on the stack.
per request. Verified by an allocation-counting benchmark that fails on any heap activity.
SIMD-accelerated scanning
HTTP parsing is fundamentally a byte-scanning problem: find the next space, colon, or \r\n delimiter. Scanning byte-by-byte wastes the CPU's ability to process 16–32 bytes per cycle.
synapserve-http-parser uses custom SIMD scanning with runtime detection — AVX2 (32 bytes/cycle) and SSE4.2 (16 bytes/cycle) on x86_64, NEON (16 bytes/cycle) on ARM64, with automatic fallback to SWAR on other platforms. Header name validation (tchar), header value validation, and URI scanning each have dedicated vectorized routines that process 16–32 bytes per instruction. Delimiter searches (spaces, \r\n) use the memchr crate. CRLF scanning and value validation are fused into a single pass that finds \r\n while simultaneously rejecting \0 and bare \r — one vectorized scan instead of two. Header name validation is fused with colon scanning into a single-pass loop. Header recognition uses word-size comparisons (u64/u32 chunks with |0x20 lowercase masking), reducing a 14-byte Content-Length match from 14 byte comparisons to 3 word comparisons.
O(1) known-header lookup
After parsing, server code needs to check specific headers — Content-Length, Connection, Transfer-Encoding. Most parsers require a linear scan through the header list. synapserve-http-parser recognizes 21 common headers during parsing via length-first, then first-byte dispatch, and tracks their presence in a u32 bitmap with positions in a fixed index array. Looking up any known header is a single bit-test plus array access — O(1), no hashing, no comparison. Resetting between requests clears just 29 bytes (the length counter and known-header index) instead of zeroing the entire header table.
Benchmarks: head-to-head with httparse
All benchmarks run head-to-head on the same machine, same inputs, same Criterion configuration. Compared against httparse 1.10 — the Rust HTTP parser used by hyper, axum, and actix-web.
Key difference: httparse only tokenizes (splits headers into name/value slices). synapserve-http-parser additionally extracts semantic metadata (content_length, chunked, keep_alive) and builds an O(1) known-header index — all during the same parse pass.
Parse throughput
synapserve is faster at all request sizes — 1.25x on small, 1.15x on medium, 1.11x on large — despite doing more work per parse. synapserve additionally extracts content_length/chunked/keep_alive and builds a 21-slot O(1) header index during the same pass. Custom AVX2/SSE4.2 SIMD scanning (with runtime detection) validates header names, values, and URIs at 32 bytes per cycle.
Apples-to-apples: with semantic extraction
A real HTTP server needs content_length, chunked, and keep_alive on every request. When we add this extraction to httparse (iterating headers post-parse, the same work synapserve does inline), synapserve is faster across all sizes:
With equal work, synapserve is 1.38x faster on medium requests and 1.46x faster on large. For responses, the advantage is similar (1.38–1.40x). The O(1) header index is free — built during a parse that’s already faster than the competition.
Header access: O(1) vs O(n)
After parsing, server code checks specific headers on every request. Most parsers require a linear scan through the header list. synapserve’s known_present bitmap plus known_index[21] makes this a single bit-test plus array dereference — constant time regardless of header position.
synapserve find() is constant ~0.6 ns regardless of position. httparse scales linearly: 20.5 ns (early) to 23.1 ns (last). For the 3–5 header lookups a typical handler performs, synapserve saves 60–112 ns per handler invocation.
faster header lookup than httparse. O(1) array index vs O(n) linear scan with case-insensitive comparison.
parse throughput on realistic AI-agent requests. Single core, no parallelism, including semantic extraction.
heap allocations. Verified per-request by an allocation-counting harness in CI.
Response writer: 46% faster
Parsing is half the story. The response writer uses the same zero-allocation philosophy — writing status lines, headers, and bodies directly into a borrowed buffer with no intermediate allocations. Batch bounds-checking eliminates per-field capacity validation, reducing a minimal 200 OK response to 3.9 ns.
Faster at parsing, faster at everything after
With custom AVX2/SSE4.2 SIMD scanning and runtime detection, synapserve is faster than httparse at raw tokenization across all request sizes — 1.25x on small, 1.15x on medium, 1.11x on large — despite doing more work per parse (semantic extraction + O(1) header indexing). But a real HTTP server doesn’t stop at tokenization. Every request handler needs to check Content-Length, Connection, and Transfer-Encoding. Every router needs to inspect Host. Every auth middleware needs Authorization.
With httparse, each of those lookups is a linear scan through the header array. With synapserve, each is a single bit-test plus array index — 32–37x faster. When equal semantic extraction work is included, synapserve is 1.38–1.46x faster on requests, 1.38–1.40x faster on responses. The O(1) header index comes for free on top of an already-faster parse.
The result: a parser that handles a realistic 9-header agent request in 200 nanoseconds — with zero allocations, zero copies, and every semantic field already extracted — adds less than 1 microsecond of overhead to the full request lifecycle. At 4.5 million parses per second per core, the network is the bottleneck, not the parser.
| Property | synapserve-http-parser | httparse |
|---|---|---|
| Parse (medium req) | 200 ns (incl. semantics) | 230 ns (tokenize only) |
| Parse + semantics | 220 ns | 304 ns (+extract) |
| Header lookup | 0.6 ns — O(1) | 20.5–23.1 ns — O(n) |
| Per-request reset | 29 bytes (clear) | 2,048 bytes (memset) |
| Header entry size | 10 bytes (Span) | 32 bytes (&str + &[u8]) |
| Allocations | 0 | 0 (caller-provided) |
| SIMD scanning | AVX2/SSE4.2/NEON (runtime detect) | SSE4.2 (compile-time only) |
| Response writer | Integrated (3.9 ns) | Parse only |
| Chunked decoder | Integrated (16B stack) | Not included |