DISTRIBUTED-SYSTEMS-ENGINEERING Contents

Outbox Pattern (The Dual-Write Fix)

The Outbox Pattern ensures reliable event publishing by storing events in the same database transaction as business state changes. This lesson explains atomicity without 2PC, polling vs CDC, ordering guarantees, and production pitfalls.

On this page

Outbox Pattern: Reliable Event Publishing Without Distributed Transactions

The Outbox Pattern solves a fundamental problem in distributed systems: how to reliably publish events when updating a database, without using two-phase commit across the database and message broker.

Instead of coordinating two systems atomically, the Outbox Pattern writes both business state and event records into the same database transaction. A separate process later publishes those events to the message broker.

The Core Problem

Consider this workflow:

  1. Update database state.
  2. Publish event to message broker.

If the database write succeeds but the event publish fails, the system becomes inconsistent. Other services never receive the event.

If the event publishes first but the database transaction fails, consumers see an event for a state that does not exist.

Without coordination, atomicity is broken.

How the Outbox Pattern Works

Instead of publishing directly to the broker:

  1. Begin database transaction.
  2. Update business table.
  3. Insert event into an outbox table.
  4. Commit transaction.

After commit, a background process reads from the outbox table and publishes events to the broker.

BEGIN TRANSACTION
UPDATE orders SET status = 'CREATED'
INSERT INTO outbox(event_type, payload, status)
VALUES ('OrderCreated', json_payload, 'PENDING')
COMMIT

Atomicity is guaranteed by the database itself.

Outbox Table Structure

Typical columns include:

  • event_id (unique identifier)
  • aggregate_id (business entity reference)
  • event_type
  • payload (JSON)
  • created_at
  • published_at
  • status (PENDING, SENT, FAILED)

This enables replay and auditing.

Event Publisher Mechanisms

1) Polling Publisher

A background worker periodically queries:

SELECT * FROM outbox
WHERE status = 'PENDING'
ORDER BY created_at
LIMIT N

After publishing successfully, it updates status to SENT.

2) Change Data Capture (CDC)

Instead of polling, use database log streaming to capture inserts and publish events in near real-time.

CDC reduces polling overhead and latency.

Production Scenario: Lost Events Without Outbox

Symptom

Orders created successfully, but downstream services never receive OrderCreated events.

Root Cause

Application crashed after DB commit but before event publish call.

Diagnosis

  • Order table updated.
  • No corresponding message in broker.
  • No retry logic for publish failure.

Resolution

  • Implement Outbox Pattern.
  • Introduce durable background publisher.
  • Add monitoring for stuck outbox entries.

Idempotency and Duplicate Events

Outbox publishing may retry on failure, producing duplicates.

Consumers must handle duplicates safely using:

  • Event IDs for deduplication.
  • Idempotent event handlers.
  • Upsert semantics.

At-least-once delivery is expected.

Ordering Guarantees

Ordering is typically guaranteed per aggregate ID:

  • Events for same entity should be processed in creation order.
  • Global ordering across all aggregates is rarely required.

Outbox publisher must preserve insertion order.

Failure Modes

  • Publisher crashes mid-batch.
  • Broker temporarily unavailable.
  • Outbox table grows unbounded.
  • Stale PENDING rows accumulate.

Monitoring and cleanup policies are required.

Observability Requirements

  • Pending outbox row count.
  • Event publishing latency.
  • Failed publish retry count.
  • Oldest pending event age.
  • Duplicate detection rate in consumers.

Outbox health directly impacts data consistency.

Failure Injection Test

# Outbox validation
1) Execute business transaction
2) Crash application before event publish
3) Restart system
4) Verify outbox publisher resumes correctly
5) Confirm event eventually appears in broker
6) Validate idempotent consumption

Common Anti-Patterns

  • Deleting outbox rows before broker confirmation.
  • No retry logic for failed publishes.
  • No monitoring of outbox backlog growth.
  • Assuming exactly-once delivery.
  • Coupling publisher lifecycle tightly to request thread.

Operational Checklist

  • Are business changes and event inserts atomic?
  • Is outbox publisher highly available?
  • Are retries implemented with backoff?
  • Are duplicates handled safely by consumers?
  • Is outbox backlog monitored?

Key Takeaways

  • Outbox Pattern provides atomicity without distributed 2PC.
  • Events are stored in same DB transaction as business state.
  • Publishing occurs asynchronously and may retry.
  • Consumers must be idempotent.
  • Monitoring outbox backlog is critical for consistency.

The Outbox Pattern is one of the most practical and widely adopted reliability patterns in modern distributed systems. It bridges the gap between transactional databases and event-driven architectures without sacrificing availability.