All Posts
Performance

SynapServe HTTP Parser Performance

Inside synapserve-http-parser: a zero-allocation, span-based HTTP/1.1 parser with SIMD-accelerated scanning. How it works, why it's different, and benchmark results against httparse.

February 21, 2026

The HTTP parser is the first code that touches every byte of every request. In a server designed for AI agent traffic at scale, parse latency directly determines maximum throughput. We built synapserve-http-parser from scratch to eliminate every allocation, every copy, and every branch we could find.

The core idea: spans, not strings

Traditional HTTP parsers allocate heap memory for every parsed field — the method, URI, each header name, each header value. A typical request with 10 headers triggers 20+ allocations before your handler sees a single byte.

synapserve-http-parser never allocates. Instead of copying bytes into owned strings, it records where each field lives in the original buffer using a 4-byte span:

span.rs — the fundamental unit
// 4 bytes. Copy. No heap. No lifetime gymnastics.
pub struct Span {
    off: u16,   // offset into the request buffer
    len: u16,   // length in bytes
}

// To read the actual bytes, borrow the original buffer:
let host = req.host.as_bytes(buf);  // &[u8], zero-copy

Every parsed field — method, URI, version, header names, header values — is a Span. The entire parse output lives on the stack in a fixed-size structure:

4 bytes

per parsed field. A Span is Copy, fits in a register, never touches the heap.

640 bytes

total stack footprint. 64 headers × 10 bytes each. The entire header table lives on the stack.

0 allocs

per request. Verified by an allocation-counting benchmark that fails on any heap activity.

SIMD-accelerated scanning

HTTP parsing is fundamentally a byte-scanning problem: find the next space, colon, or \r\n delimiter. Scanning byte-by-byte wastes the CPU's ability to process 16–32 bytes per cycle.

synapserve-http-parser uses custom SIMD scanning with runtime detection — AVX2 (32 bytes/cycle) and SSE4.2 (16 bytes/cycle) on x86_64, NEON (16 bytes/cycle) on ARM64, with automatic fallback to SWAR on other platforms. Header name validation (tchar), header value validation, and URI scanning each have dedicated vectorized routines that process 16–32 bytes per instruction. Delimiter searches (spaces, \r\n) use the memchr crate. CRLF scanning and value validation are fused into a single pass that finds \r\n while simultaneously rejecting \0 and bare \r — one vectorized scan instead of two. Header name validation is fused with colon scanning into a single-pass loop. Header recognition uses word-size comparisons (u64/u32 chunks with |0x20 lowercase masking), reducing a 14-byte Content-Length match from 14 byte comparisons to 3 word comparisons.

O(1) known-header lookup

After parsing, server code needs to check specific headers — Content-Length, Connection, Transfer-Encoding. Most parsers require a linear scan through the header list. synapserve-http-parser recognizes 21 common headers during parsing via length-first, then first-byte dispatch, and tracks their presence in a u32 bitmap with positions in a fixed index array. Looking up any known header is a single bit-test plus array access — O(1), no hashing, no comparison. Resetting between requests clears just 29 bytes (the length counter and known-header index) instead of zeroing the entire header table.

Benchmarks: head-to-head with httparse

All benchmarks run head-to-head on the same machine, same inputs, same Criterion configuration. Compared against httparse 1.10 — the Rust HTTP parser used by hyper, axum, and actix-web.

Key difference: httparse only tokenizes (splits headers into name/value slices). synapserve-http-parser additionally extracts semantic metadata (content_length, chunked, keep_alive) and builds an O(1) known-header index — all during the same parse pass.

Parse throughput

REQUEST PARSE LATENCY — single core, Criterion, 100 samples
small request 35 bytes — GET / with 1 header
synapserve
42 ns — 803 MiB/s
httparse 1.10
52 ns — 641 MiB/s
medium request 368 bytes — POST with 9 AI-agent headers
synapserve
200 ns — 1.71 GiB/s
httparse 1.10
230 ns — 1.49 GiB/s
large request 733 bytes — POST with 20 headers
synapserve
420 ns — 1.63 GiB/s
httparse 1.10
466 ns — 1.47 GiB/s

synapserve is faster at all request sizes — 1.25x on small, 1.15x on medium, 1.11x on large — despite doing more work per parse. synapserve additionally extracts content_length/chunked/keep_alive and builds a 21-slot O(1) header index during the same pass. Custom AVX2/SSE4.2 SIMD scanning (with runtime detection) validates header names, values, and URIs at 32 bytes per cycle.

Apples-to-apples: with semantic extraction

A real HTTP server needs content_length, chunked, and keep_alive on every request. When we add this extraction to httparse (iterating headers post-parse, the same work synapserve does inline), synapserve is faster across all sizes:

PARSE + SEMANTIC EXTRACTION — equal work, Criterion, 100 samples
medium request 368 bytes — synapserve 1.38x faster
synapserve
220 ns — 4.5M req/s
httparse+extract
304 ns — 3.3M req/s
large request 733 bytes — synapserve 1.46x faster
synapserve
413 ns — 2.4M req/s
httparse+extract
604 ns — 1.7M req/s

With equal work, synapserve is 1.38x faster on medium requests and 1.46x faster on large. For responses, the advantage is similar (1.38–1.40x). The O(1) header index is free — built during a parse that’s already faster than the competition.

Header access: O(1) vs O(n)

After parsing, server code checks specific headers on every request. Most parsers require a linear scan through the header list. synapserve’s known_present bitmap plus known_index[21] makes this a single bit-test plus array dereference — constant time regardless of header position.

HEADER LOOKUP LATENCY — pre-parsed 9-header request, Criterion
Content-Type position 2 of 9 — 32x faster
synapserve
0.65 ns
httparse
20.5 ns
Content-Length position 9 of 9 (last) — 37x faster
synapserve
0.59 ns
httparse
22.0 ns
X-Request-Id position 6 of 9 — 34x faster
synapserve
0.67 ns
httparse
23.1 ns

synapserve find() is constant ~0.6 ns regardless of position. httparse scales linearly: 20.5 ns (early) to 23.1 ns (last). For the 3–5 header lookups a typical handler performs, synapserve saves 60–112 ns per handler invocation.

32-37x

faster header lookup than httparse. O(1) array index vs O(n) linear scan with case-insensitive comparison.

1.71 GiB/s

parse throughput on realistic AI-agent requests. Single core, no parallelism, including semantic extraction.

0

heap allocations. Verified per-request by an allocation-counting harness in CI.

Response writer: 46% faster

Parsing is half the story. The response writer uses the same zero-allocation philosophy — writing status lines, headers, and bodies directly into a borrowed buffer with no intermediate allocations. Batch bounds-checking eliminates per-field capacity validation, reducing a minimal 200 OK response to 3.9 ns.

RESPONSE WRITE LATENCY — single core, Criterion
minimal 200 OK
3.9 ns
SSE event headers
33.5 ns
JSON API (6 headers)
41.3 ns

Faster at parsing, faster at everything after

With custom AVX2/SSE4.2 SIMD scanning and runtime detection, synapserve is faster than httparse at raw tokenization across all request sizes — 1.25x on small, 1.15x on medium, 1.11x on large — despite doing more work per parse (semantic extraction + O(1) header indexing). But a real HTTP server doesn’t stop at tokenization. Every request handler needs to check Content-Length, Connection, and Transfer-Encoding. Every router needs to inspect Host. Every auth middleware needs Authorization.

With httparse, each of those lookups is a linear scan through the header array. With synapserve, each is a single bit-test plus array index — 32–37x faster. When equal semantic extraction work is included, synapserve is 1.38–1.46x faster on requests, 1.38–1.40x faster on responses. The O(1) header index comes for free on top of an already-faster parse.

The result: a parser that handles a realistic 9-header agent request in 200 nanoseconds — with zero allocations, zero copies, and every semantic field already extracted — adds less than 1 microsecond of overhead to the full request lifecycle. At 4.5 million parses per second per core, the network is the bottleneck, not the parser.

HOW IT COMPARES — head-to-head benchmarks, httparse 1.10
Property synapserve-http-parser httparse
Parse (medium req) 200 ns (incl. semantics) 230 ns (tokenize only)
Parse + semantics 220 ns 304 ns (+extract)
Header lookup 0.6 ns — O(1) 20.5–23.1 ns — O(n)
Per-request reset 29 bytes (clear) 2,048 bytes (memset)
Header entry size 10 bytes (Span) 32 bytes (&str + &[u8])
Allocations 0 0 (caller-provided)
SIMD scanning AVX2/SSE4.2/NEON (runtime detect) SSE4.2 (compile-time only)
Response writer Integrated (3.9 ns) Parse only
Chunked decoder Integrated (16B stack) Not included