Content Delivery Network Blog

What Is TTFB (Time to First Byte)? Definition, Use Cases, and Enterprise Context

Written by BlazingCDN | Jan 1, 1970 12:00:00 AM

What Is TTFB (Time to First Byte)? Definition, Use Cases, and Enterprise Context

What is time to first byte?

Time to first byte is the latency metric that measures the elapsed time from when a client makes an HTTP request until the first byte of the HTTP response arrives at the client.

TTFB sits at the boundary between transport setup, request delivery, and application response generation, which is why it gets used as a shorthand for "server response time" even though that is only part of what it contains. In browser performance tooling, the metric includes connection establishment and request transmission on first use, then ends when response bytes start arriving; on a reused connection, those setup phases may be absent and the number shrinks accordingly.

The term is defined in web performance guidance and exposed by the PerformanceNavigationTiming interface, where responseStart marks the relevant boundary for a navigation request. For HTTP practitioners, the important disambiguation is this: TTFB is not full page load time, not Largest Contentful Paint, not backend processing time alone, and not origin latency in isolation. It is a client-observed interval, not a single server-side timer.

How does TTFB work in the request path?

For a cold navigation, time to first byte starts before the origin sees anything. The client may perform DNS resolution, establish TCP or QUIC, complete TLS, transmit the request headers, wait through any CDN or reverse proxy decisioning, and then wait for the upstream application, cache, or object store to begin emitting response bytes. The metric stops when the first response byte reaches the client and responseStart becomes observable.

That sequence matters because a bad TTFB metric does not identify a single bottleneck by itself. High values can come from handshake overhead, queueing in the edge tier, cache misses, collapsed forwarding contention, origin think time, slow dynamic rendering, delayed first-byte flush in the application, or congestion on the return path. On HTTP/2 and HTTP/3, multiplexing changes the shape of the problem further: connection setup may be amortized across many requests, but stream scheduling and prioritization can still delay when a given stream sees its first response byte.

In concrete HTTP terms, the client sends request headers, the server or intermediary determines whether it can serve from cache, and the response path begins once status and headers are ready to transmit. A 200 from edge cache often has a radically different HTML time to first byte than a 200 generated dynamically at origin. A 304 can show lower TTFB than a fresh 200 because payload generation is skipped, but that does not mean the user experience is automatically better if revalidation is happening too often.

Failure modes are equally useful diagnostically. A timeout before headers arrive produces no meaningful TTFB because there is no first byte. A 503 generated quickly by an overloaded proxy may have excellent time to first byte score while representing a severely degraded service. Early Hints with status 103 can further complicate interpretation because some systems treat informational response bytes differently from the final response start.

Where does TTFB appear in practice?

You see TTFB in browser DevTools, synthetic monitoring, RUM pipelines, CDN logs, reverse-proxy timing headers, and application performance dashboards. In navigation timing, responseStart is the browser-side anchor. At the edge, products from BlazingCDN, Amazon CloudFront, Cloudflare, Fastly, and Akamai expose adjacent telemetry such as cache status, origin fetch timing, and request processing latency that engineers use to decompose time to first byte into edge versus origin components.

One production scenario where time to first byte matters is dynamic HTML behind authentication or personalization. Here the number captures the compounded effect of TLS resumption rates, edge routing, origin concurrency, template rendering, and the application's willingness to flush headers early versus buffering until the first body chunk is ready.

A second scenario is video manifest delivery and API bootstrap traffic. For HLS or DASH manifests, a poor server response time pushes startup delay even when segment throughput is healthy later. For API-driven SPAs, a slow first byte on the HTML shell or bootstrap JSON blocks everything downstream even if JavaScript bundles come from cache.

A third scenario is enterprise cache architecture. Teams often celebrate high cache hit ratio while missing that their TTFB metric remains poor because misses collapse onto a slow origin shield, or because cacheable HTML still performs synchronous personalization before the first byte. This is exactly where CDN-layer design matters. For enterprises that care about lowering first-byte latency without overspending, BlazingCDN pricing starts at $4 per TB and scales down to $2 per TB at 2 PB+, with migration in 1 hour and no other costs; it is positioned for stability and fault tolerance comparable to Amazon CloudFront while remaining significantly more cost-effective, with flexible configuration, fast scaling under demand spikes, and 100% uptime.

TTFB vs related metrics: what gets confused with it?

  • Latency: latency is the broader delay concept across a path or operation; TTFB is one specific client-observed latency interval for an HTTP request.
  • Server response time: many dashboards use this loosely as a synonym, but server response time usually excludes some client-side network setup that time to first byte includes.
  • First Contentful Paint: FCP measures when pixels render, not when the first response byte arrives; a page can have good TTFB and poor render timing, or the reverse.
  • Largest Contentful Paint: LCP is a user-centric rendering milestone and depends on resource discovery, download, and rendering after TTFB has already occurred.
  • Round-trip time: RTT measures network path delay for packet exchange; TTFB can include multiple RTTs plus application processing and intermediary behavior.
  • Download time: TTFB ends at the first byte, while download time covers transfer of the remaining response body.

What are the common misconceptions and edge cases?

The first common mistake is treating time to first byte as a pure backend metric. It is not. If your measurement point is the browser, connection reuse, HTTP version, TLS handshake behavior, CDN placement, and client geography all affect the number before your application starts work.

The second mistake is assuming a low time to first byte score means a fast experience. A tiny 301 or 403 can produce excellent TTFB and still be operationally useless. Likewise, aggressive header flushing can improve HTML time to first byte while doing nothing for FCP or LCP if the meaningful content still waits on blocking data fetches.

The third mistake is comparing TTFB across tools without checking methodology. Synthetic tests often capture cold-connection behavior from fixed vantage points, while RUM tends to include warm connections, session resumption, and real network variance. Engineers end up arguing about regressions that are really differences in sampling and timing boundaries.

An edge case worth calling out is informational responses and streaming. With 103 Early Hints, chunked transfer, or server-sent events, vendors and tools differ on what counts as the first byte that "starts" the response in dashboards versus browser timing APIs. Another is service worker interception, where browser-observed TTFB metric can reflect worker startup and script execution rather than origin or edge behavior directly.

How should engineers use TTFB this week?

Pick one high-value route, preferably HTML or a startup API, and compare browser responseStart with your CDN edge timing and origin processing logs for the same request IDs. If the gap is large, inspect cache status, connection reuse, and whether headers are being buffered unnecessarily before first-byte flush. Then read the responseStart semantics in your timing pipeline and make sure everyone on the team is arguing from the same definition of time to first byte.