SYSTEM-DESIGN Contents

Global Latency Optimization

Reduce latency in global systems.

On this page

Global Latency Is Mostly About Locality

Global latency optimization is not primarily about faster code; it is about keeping requests local. Cross-region hops dominate p99 because they add network delay and variance. Production-first global systems avoid cross-region dependencies on hot paths.

Where Latency Comes From

  • Client to edge (last-mile variability)
  • Edge to region (geo distance and routing)
  • Region to region (worst-case jitter and tail risk)
  • Downstream dependencies (DB, cache, third-party APIs)

Practical Optimization Levers

  • Serve static assets from CDN to remove origin distance for most bytes.
  • Edge caching for cacheable GET responses.
  • Regional read replicas for read-heavy endpoints with staleness tolerance.
  • Geo-owned data so writes stay local per tenant/user.
  • Minimize payload size (compression, avoid over-fetching).

Tail Latency and Timeouts

As regions get farther apart, tail latency variance grows. Tight timeouts, retries with jitter, and circuit breakers matter more in global setups. Without them, small jitters become system-wide p99 spikes and retry storms.

Global rule of thumb:
- Keep the hot path within one region when possible
- If cross-region is unavoidable, bound it with timeouts and fallbacks

Production-First Takeaway

Optimize global latency by reducing cross-region work, not by micro-optimizing code. Locality, caching, and routing strategy are the main levers; timeouts and backpressure protect you when variance spikes.