DATABASE-ADVANCED Contents

Replication Types

Synchronous vs asynchronous replication and the latency/consistency cost curve.

On this page

Replication Types: Durability vs Latency Tradeoff

Replication is not only about copying data. It defines how much data loss you can tolerate (RPO) and how fast your system can acknowledge writes (latency).

The core distinction:

  • Synchronous replication
  • Asynchronous replication

Asynchronous Replication

In async replication, the primary acknowledges a write after persisting locally. Replicas receive changes afterward.

Flow:

  • Client sends write to primary
  • Primary writes to WAL/binlog and commits
  • Primary responds to client
  • Replica receives and applies changes later

Advantages:

  • Low write latency
  • High throughput

Risk:

  • If primary crashes before replica applies change, data loss occurs.

Synchronous Replication

In synchronous replication, the primary waits for one or more replicas to confirm receipt (and sometimes durability) before acknowledging commit.

Flow:

  • Primary writes locally
  • Replica confirms receipt/durability
  • Primary acknowledges commit

Advantages:

  • Stronger durability guarantees
  • Lower RPO

Cost:

  • Higher latency (network round-trip)
  • Throughput sensitive to slow replicas

Quorum-Based Replication

Some systems allow quorum acknowledgment (e.g., 1 of 2 replicas must confirm). This balances durability and availability.

Quorum strategies require careful failure modeling to avoid split-brain.

Durability Semantics Matter

Not all “sync” guarantees are equal:

  • Replica received WAL
  • Replica flushed to disk
  • Replica applied transaction

Each level changes durability and read consistency guarantees.

Replication and RPO

RPO (Recovery Point Objective) defines acceptable data loss.

  • Async replication → RPO > 0 possible
  • Sync replication → RPO close to 0 (if properly configured)

Business requirements should drive replication mode.

Replication and RTO

RTO (Recovery Time Objective) depends on:

  • Failover automation
  • Replica catch-up speed
  • Cluster orchestration

Network Dependency

Synchronous replication couples write latency to network health. Network jitter or cross-region replication can severely impact p95/p99 latency.

Failure Modes in Production

  • Replica lag spike: async replica falls behind.
  • Write stall: sync replica unavailable → primary blocks commits.
  • Split brain: improper failover coordination.
  • Data loss: async primary crashes before replica apply.
  • Quorum misconfiguration: availability reduced unexpectedly.

Operational Checklist

  • Define RPO and RTO explicitly.
  • Choose replication mode based on business risk tolerance.
  • Monitor replica lag continuously.
  • Test primary crash scenarios in staging.
  • Understand what “sync” confirmation actually means in your engine.
  • Document failover decision rules.
  • Avoid cross-region sync replication unless latency budget allows.
  • Monitor commit latency impact when enabling sync mode.
  • Test network degradation scenarios.
  • Have rollback plan for replication mode changes.

Summary

Replication mode determines your durability-latency tradeoff. Async maximizes throughput but risks data loss. Sync minimizes RPO but increases latency and sensitivity to replica health. Production engineering requires explicit RPO/RTO alignment with replication configuration.