Consensus Algorithms Overview

Consensus provides agreement on a single value or log despite failures. Learn the problems it solves, quorum assumptions, safety vs liveness, and what Raft and Paxos style systems guarantee in production.

On this page

What Consensus Solves

Agree on a leader and membership changes.
Replicate an ordered log of commands safely.
Provide a single source of truth for critical metadata.

Safety vs Liveness

Safety: nothing bad happens, no two leaders commit conflicting logs.
Liveness: something good eventually happens, the system makes progress.
Production rule: safety is non negotiable, liveness depends on timing and failure assumptions.

Quorum Basics

Consensus typically requires a majority quorum.
Majority intersection prevents two different quorums from committing conflicting decisions.
With 2f + 1 nodes, the system can tolerate f failures while maintaining safety.

Log Replication Model

Client submits command to leader.
Leader appends to log and replicates to followers.
Once a quorum acknowledges, entry is committed.
Committed entries are applied to a deterministic state machine.

Raft vs Paxos High Level

Raft emphasizes understandability with leader election and log replication steps.
Paxos family focuses on proving safety properties under asynchronous networks.
In practice, both implement quorum based agreement with similar guarantees.

Operational Concerns

Write latency depends on quorum round trips.
Membership changes must be handled safely to avoid losing quorum.
Snapshots and log compaction are required for long running clusters.

Failure Modes

Election storms due to aggressive timeouts and unstable networks.
Disk latency spikes cause followers to fall behind and reduce throughput.
Misconfigured membership change reduces fault tolerance and breaks availability.
Split brain behavior if quorum rules are violated by implementation or operators.

Production Checklist

Majority quorum is enforced for commits.
Election timeouts tuned for real latency and pauses.
Snapshots and compaction are configured and monitored.
Node health signals include disk, network, and replication lag.

← Leader Election Concepts

Event Driven Architectures →