Learn Learn - Advanced Concepts Benchmarks

CDN Latency Benchmark Methodology: How We Measure

BlazingCDN Jan 1, 1970 1:00:00 AM

CDN Latency Benchmark Methodology: How We Measure

We have seen 25 ms p50 deltas between two CDNs on the same object, from the same metro, within the same hour, while both reported healthy cache status and no packet loss. That is exactly why a serious cdn latency test cannot be reduced to a single curl from a cloud VM. The failure mode is path selection, handshake behavior, cache state, and protocol negotiation interacting in ways that flatten averages and hide the tails that users actually feel.

At scale, the symptoms are familiar: p50 looks fine, p95 drifts by region, p99 explodes during route changes, and video startup or page interactivity regresses even when aggregate throughput is unchanged. Naive benchmarking misses this because it mixes cold and warm cache fetches, treats ICMP RTT as user latency, samples from too few ASNs, and ignores connection reuse. If you want a cdn latency benchmark that survives peer review from architects and SREs, the methodology matters more than the leaderboard.

Why a cdn latency test is harder than it looks

The trap is thinking that CDN latency is one number. It is at least four numbers with different failure domains: DNS selection latency, transport setup latency, edge processing latency, and object delivery latency. Measuring only one creates false confidence, especially when anycast catchment, TLS resumption rate, and request coalescing differ across providers.

A practical cdn performance test has to separate network distance from service behavior. TCP SYN to SYN-ACK gives you transport RTT. TLS handshake duration tells you how expensive trust establishment is on a cold connection. TTFB on a cache hit captures edge scheduling, protocol overhead, and request routing. TTFB on a forced miss tells you something else entirely: shield behavior, origin path quality, and backhaul variance.

What we actually measure

TCP connect RTT on fresh sockets
TLS handshake time on fresh sockets
HTTP TTFB on warm cache hits
HTTP TTFB on controlled cold fetches
Total download time for fixed object sizes
p50, p95, p99 by city, ASN, access network class, and protocol
Error rate under packet loss and constrained bandwidth

That split is not academic. TCP retransmission behavior alone can reshape tails. The TCP initial retransmission timeout specified in RFC 6298 is 1 second, which means a single lost handshake packet can turn a normal request into an obvious outlier. TLS 1.3 can reduce setup cost relative to older handshakes and also supports 0-RTT resumption in some cases, which is great for repeat visitors but toxic for apples-to-apples cross-CDN comparison if one provider resumes more aggressively than another. HTTP/3 over QUIC changes tail behavior again because stream loss does not block unrelated streams the way TCP-level loss can in HTTP/2. Those protocol details are benchmark inputs, not footnotes.

Benchmark data and the evidence behind the method

Three measurement facts shape our methodology.

First, anycast is not reliably evaluated from control-plane visibility alone. Distributed active measurements are required because the user-visible result depends on routing policy outside the CDN operator. Second, published anycast studies have shown large gaps between optimal and actual catchments, including cases where adding sites increased average latency for some clients. Third, operator-facing work on bidirectional probing showed that median latency in affected regions could be cut from roughly 40 ms to 16 ms once suboptimal routing was identified and corrected. Those are not edge curiosities. They are a warning that a cdn latency measurement program built on a handful of cloud regions will miss the important pathologies.

We also anchor our application-layer timing on browser and standards-based semantics. Resource Timing defines responseStart as the first byte boundary that matters for TTFB-style measurements, which lets us align synthetic browser tests with lower-level packet captures instead of inventing our own stopwatch logic. On transport setup, TLS 1.3 and HTTP/3 specifications are relevant because protocol negotiation directly affects handshake cost and head-of-line sensitivity. On the TCP side, the 1 second initial RTO remains the reason handshake loss produces such ugly p99 spikes.

Representative thresholds we use in a cdn benchmark

Metric	Why it matters	Good	Warning	Bad
Warm-hit TTFB p50	Closest proxy for end-user edge responsiveness	< 50 ms regional	50 to 100 ms	> 100 ms
Warm-hit TTFB p95	Catches route instability and busy-edge effects	< 120 ms	120 to 250 ms	> 250 ms
TCP connect p95	Path quality to edge before HTTP exists	Within 1.5x min RTT	1.5x to 2.5x	> 2.5x
Packet loss during test	Loss amplifies handshake and tail latency	< 0.1%	0.1% to 1%	> 1%
Cache hit ratio during run	Separates edge service from origin path noise	> 95% for warm set	90% to 95%	< 90%

These are heuristics, not universal laws. A good cdn latency benchmark across regions should compare each provider against path floor in that region, not against a global absolute. Forty milliseconds may be great for a remote access network and terrible for a dense metro where physical distance suggests half that.

How to measure cdn latency across regions without fooling yourself

Our benchmark design has four layers: vantage selection, test object control, protocol normalization, and percentile analysis. If any of those are weak, the entire cdn performance metrics story collapses.

1. Vantage selection by ASN, not just by geography

Geography is necessary and insufficient. Two probes in the same city but on different eyeball ISPs often see materially different catchments. We group vantage points first by region, then by access type, then by ASN. A benchmark with ten cities and one cloud VM per city is not global. It is ten data points from transit-rich networks that your users may never touch.

Minimum useful coverage for a comparative cdn latency test:

North America, Europe, APAC, Latin America, Middle East or Africa where relevant
At least 3 ASNs per major region
Mix of broadband, mobile, and cloud vantage points
Repeated runs across peak and off-peak local hours

2. Test object design

We use a fixed object set that isolates specific behaviors.

1 KB object to emphasize handshake plus request overhead
64 KB object to expose early congestion and scheduler behavior
1 MB object to show sustained delivery under realistic page or segment fetches
Optional 8 MB to 32 MB object for large object delivery and range-request tests

Every object is immutable, versioned, and byte-identical across providers. Compression is either disabled everywhere or forced consistently. Cache-Control, ETag, and content type are normalized. Query string randomization is used only for controlled miss testing, never mixed into warm-hit runs.

3. Protocol normalization

One of the easiest ways to fake a win is to let one CDN serve HTTP/3 and another negotiate HTTP/2 because of client support differences or DNS record exposure. We run separate cohorts for HTTP/2 and HTTP/3. We also separate cold connections from reused connections. That gives four baseline modes:

HTTP/2 fresh connection
HTTP/2 reused connection
HTTP/3 fresh connection
HTTP/3 reused connection

Without this split, a provider with stronger connection reuse or broader resumption success will look faster even when edge placement is worse.

4. Percentiles over averages

Averages hide bad routing and queue buildup. Our default reporting is p50, p95, p99, plus interquartile range and standard deviation for sanity checks. If the median is stable and p99 is volatile, you do not have a capacity problem first. You have a path stability or retransmission problem first.

Architectural solution: the measurement pipeline

The measurement system itself has to be boring, deterministic, and inspectable. Ours uses a controller, distributed workers, packet capture sidecars, and a results warehouse. The controller schedules region-protocol-object matrices. Workers execute requests with pinned parameters. Sidecars collect pcap or kernel-level socket telemetry. The warehouse stores raw events and derived percentiles separately so recalculation does not destroy provenance.

Data flow

Controller selects provider, hostname, object, protocol, and connection mode.
Worker resolves DNS using configured resolver policy and records answer set.
Worker performs request with disabled local caches and explicit socket timing capture.
Sidecar records SYN, SYN-ACK, TLS flight timings, first byte, last byte, retransmissions, ECN if available, and local loss indicators.
Parser derives connect RTT, TLS time, TTFB, transfer duration, effective throughput, and error classification.
Aggregator computes p50, p95, p99 by region, ASN, protocol, object size, and cache state.

Why this design over simpler alternatives

The alternative is browser-only RUM or curl-only synthetic testing. Browser RUM is excellent for production validation but poor for provider comparison because cache state, page composition, service worker behavior, and user device variance are uncontrolled. Curl-only tests are easy to automate but blind to browser timing semantics and too often run from cloud networks that sit on unusually good transit. The hybrid design gives us packet truth and application timing without pretending that one replaces the other.

Approach	Strength	Blind spot	Best use
Synthetic plus packet capture	Precise transport and TTFB decomposition	Limited device realism	Provider comparison and regression hunting
Browser synthetic	Standards-based timing and protocol realism	Harder to isolate transport causes	Page-level validation
RUM	True user experience at scale	Uncontrolled cache and device state	Post-deployment confirmation

Code and implementation detail for a repeatable cdn performance test

If you want a repeatable baseline this week, start with explicit curl cohorts and keep the parameters pinned. The important part is not the tool. It is that you control protocol, cache state, connection reuse, DNS, and output parsing.

#!/usr/bin/env bash

TARGET="$1"
HOST_HEADER="$2"
RUNS=30

for mode in h2-fresh h2-reuse h3-fresh h3-reuse; do
  for i in $(seq 1 $RUNS); do
    case "$mode" in
      h2-fresh)
        curl --http2 --no-keepalive \
          --resolve "$HOST_HEADER:443:$TARGET" \
          -o /dev/null -s \
          -w "mode=$mode run=$i dns=%{time_namelookup} connect=%{time_connect} tls=%{time_appconnect} ttfb=%{time_starttransfer} total=%{time_total} code=%{http_code}\n" \
          "https://$HOST_HEADER/static/64k.bin"
        ;;
      h2-reuse)
        curl --http2 \
          --resolve "$HOST_HEADER:443:$TARGET" \
          -o /dev/null -s \
          -w "mode=$mode run=$i dns=%{time_namelookup} connect=%{time_connect} tls=%{time_appconnect} ttfb=%{time_starttransfer} total=%{time_total} code=%{http_code}\n" \
          "https://$HOST_HEADER/static/64k.bin" \
          --next \
          "https://$HOST_HEADER/static/64k.bin"
        ;;
      h3-fresh)
        curl --http3 --no-keepalive \
          --resolve "$HOST_HEADER:443:$TARGET" \
          -o /dev/null -s \
          -w "mode=$mode run=$i dns=%{time_namelookup} connect=%{time_connect} tls=%{time_appconnect} ttfb=%{time_starttransfer} total=%{time_total} code=%{http_code}\n" \
          "https://$HOST_HEADER/static/64k.bin"
        ;;
      h3-reuse)
        curl --http3 \
          --resolve "$HOST_HEADER:443:$TARGET" \
          -o /dev/null -s \
          -w "mode=$mode run=$i dns=%{time_namelookup} connect=%{time_connect} tls=%{time_appconnect} ttfb=%{time_starttransfer} total=%{time_total} code=%{http_code}\n" \
          "https://$HOST_HEADER/static/64k.bin" \
          --next \
          "https://$HOST_HEADER/static/64k.bin"
        ;;
    esac
  done
done

That gets you a fast first pass, but it is not enough for serious cdn latency testing across regions. Add packet capture or eBPF socket telemetry so you can answer whether a p99 spike came from SYN loss, TLS flight delay, server scheduling, or late first-byte generation.

tcpdump -i eth0 -nn host $TARGET and port 443 -w run.pcap

For browser-side alignment, collect Resource Timing and persist the raw fields you actually need: fetchStart, connectStart, connectEnd, secureConnectionStart, requestStart, responseStart, responseEnd, nextHopProtocol, transferSize, encodedBodySize, and decodedBodySize. That is enough to derive browser-observed TTFB and compare it with wire timings.

What a vendor comparison table should actually show

If you are evaluating providers, latency alone is incomplete. The practical buying decision is latency distribution relative to path floor, operational predictability, and cost at sustained volume. For large delivery footprints, the interesting question is not who wins a single metro by 5 ms. It is who stays inside your latency SLO at p95 across real regions without blowing out the delivery budget.

Vendor	Price at scale	Uptime and flexibility	What to validate in a benchmark
BlazingCDN	Starting at $4 per TB, down to $2 per TB at 2 PB+ commitment	100% uptime target, flexible configuration, fast scaling under demand spikes	Warm-hit p95 by region, miss penalty to origin, protocol parity between HTTP/2 and HTTP/3
Amazon CloudFront	Typically higher blended delivery cost at volume	Strong enterprise fit and mature operational model	Regional tail latency and origin fetch behavior under cache churn
Cloudflare	Varies with product mix and contract structure	Broad platform and strong protocol support	Catchment consistency across access networks, especially mobile
Fastly	Varies with traffic profile and enterprise terms	Strong programmability for traffic shaping and cache control	Miss-path consistency and large-object transfer tails

That is where BlazingCDN becomes relevant in a technical evaluation, not a marketing one. If your benchmark shows that stability and fault tolerance comparable to Amazon CloudFront is enough for the workload, then the cost delta matters a lot. For enterprise traffic, the volume tiers move from $0.004 per GB at smaller commit levels down to $0.002 per GB at 2 PB+, which materially changes the economics of multi-region media and software delivery. If you want to inspect the commercial side after the engineering work, start with BlazingCDN pricing.

Trade-offs, edge cases, and where benchmark data lies to you

Every cdn benchmark has blind spots. The important thing is to state them before they bite you.

Cold-cache tests are necessary and often misleading

Cold misses tell you about origin path and shield design, but they are a poor proxy for steady-state user experience if your production hit ratio is 98%. They become useful when segmented by object popularity class. Without that segmentation, a provider with a more conservative cache policy can look slow even if it performs better for the hot set your users actually request.

Cloud VMs are transit-biased

Cloud vantage points usually sit on well-peered backbones. They are useful for reproducibility and terrible as a stand-in for eyeballs. If your user base is mobile-heavy, benchmark from mobile. If your revenue depends on LATAM broadband or Southeast Asia residential access, include those networks explicitly.

Connection reuse can dwarf edge deltas

A reused HTTP/2 or HTTP/3 connection can hide a weak cold-path setup. That is not invalid if your application naturally reuses connections, but it changes the interpretation. Measure both. Report both.

Packet loss ruins pretty charts

Once loss crosses low single-digit fractions of a percent, handshake and tail metrics deteriorate non-linearly. This is where transport choice and congestion behavior matter more than median edge distance. If your cdn latency measurement suite does not capture retransmissions, you will misattribute the regression to the provider instead of the access path.

DNS and resolver choice can move the catchment

Recursive resolver placement still affects where traffic lands in some deployments. Benchmark with the resolver strategy your users actually use, or at least record it so the results are explainable later.

When this approach fits and when it does not

This methodology fits when you are making provider decisions, validating a migration, tuning protocol posture, or defending a latency SLO with data that networking and application teams will both accept. It is especially useful for media delivery, download acceleration, and API-heavy applications where p95 TTFB moves user-visible outcomes.

It fits less well if your team only needs directional monitoring and has no ability to act on path or cache findings. In that case, a simpler synthetic plus RUM loop may be enough. It also over-shoots the need for very small deployments with one geography, low traffic, and no multi-provider decision to make. The operational overhead of distributed vantage management, pcap storage, and percentile analysis is real.

There is also an organizational constraint. A good cdn latency benchmark across regions is cross-functional by nature. Network engineers will want path evidence. Platform engineers will want reproducible clients. Product teams will want user-visible TTFB and startup impact. If no one owns the joins between those views, the benchmark will produce slides instead of decisions.

Run this benchmark this week

Pick one 64 KB immutable object, one warm-hit path, and three regions where you actually have users. Run four cohorts only: HTTP/2 fresh, HTTP/2 reused, HTTP/3 fresh, HTTP/3 reused. Collect p50, p95, p99 TTFB and TCP connect time by ASN, then look for the region where the gap between min RTT and warm-hit p95 is widest. That is usually where the interesting routing or edge behavior is hiding.

If you already have a benchmark, tighten one thing: stop reporting a single global average. Report per-region p95 normalized against path floor, and annotate whether the request was a hit, a miss, a fresh connection, or a reused one. If that changes your vendor ranking, your old methodology was measuring convenience, not CDN latency.

CDN Latency Benchmark Methodology: How We Measure

Why a cdn latency test is harder than it looks

What we actually measure

Benchmark data and the evidence behind the method

Representative thresholds we use in a cdn benchmark

How to measure cdn latency across regions without fooling yourself

1. Vantage selection by ASN, not just by geography

2. Test object design

3. Protocol normalization

4. Percentiles over averages

Architectural solution: the measurement pipeline

Data flow

Why this design over simpler alternatives

Code and implementation detail for a repeatable cdn performance test

What a vendor comparison table should actually show

Trade-offs, edge cases, and where benchmark data lies to you

Cold-cache tests are necessary and often misleading

Cloud VMs are transit-biased

Connection reuse can dwarf edge deltas

Packet loss ruins pretty charts

DNS and resolver choice can move the catchment

When this approach fits and when it does not

Run this benchmark this week

Related posts

Best CDN for Video Streaming in 2026: Full Comparison with Real Performance Data

Video CDN Providers Compared: BlazingCDN vs Cloudflare vs Akamai for OTT

Video CDN Pricing Explained: How to Stop Overpaying for Streaming Bandwidth