Leader Election Concepts
On this page
Why Leaders Exist
- Provide a single ordered write path.
- Centralize coordination decisions such as membership changes.
- Simplify conflict resolution compared to multi leader designs.
Leader Election Requirements
- At most one leader can be active for the same group at a time.
- Leader must be discoverable by clients and followers.
- Leadership must transfer safely under failures.
Heartbeats and Failure Detection
- Followers detect leader failure via missed heartbeats.
- Timeouts are probabilistic and can cause false positives.
- Production rule: tune timeouts to match network and GC behavior, not optimism.
Leases and Epochs
- Lease: time bounded leadership grant.
- Epoch term: monotonic leadership generation number.
- Epochs help clients reject stale leaders and prevent split brain writes.
Fencing Tokens
- A fencing token is a monotonic number attached to writes.
- Storage rejects writes with older tokens.
- This prevents a previously isolated leader from continuing to write after a new leader exists.
Clock Drift Risks
- Lease safety depends on clock assumptions.
- Clock drift and pauses can break lease logic if not designed carefully.
- Prefer epoch based validation at the storage boundary over relying only on time.
Failure Modes
- Split brain from network partition and unsafe lease logic.
- Thrashing where leaders change too frequently due to aggressive timeouts.
- Stale leader continues serving writes because clients cache old leader address.
- Unbounded failover causes write unavailability during repeated elections.
Incident Triage Checklist
- Are elections frequent? Inspect timeouts, GC pauses, and network loss.
- Is a stale leader serving? Verify epoch or fencing enforcement at storage.
- Do clients retry safely and refresh leader discovery promptly?
- Is membership view consistent across nodes?
Production Checklist
- Leadership epochs are monotonic and validated on every write.
- Client leader discovery has fast refresh and fallback.
- Election timeouts tuned and tested under latency and pause conditions.
- Split brain protection via fencing tokens or equivalent mechanism.