DATABASE-ADVANCED Contents

Shard Key Design

Choosing shard keys for uniform distribution, locality, and query compatibility.

On this page

Shard Key Design: The Most Expensive Mistake You Can Make

In a sharded system, the shard key decides where data lives and how requests are routed. If you choose the wrong shard key, you will eventually be forced into painful migrations, cross-shard queries, and correctness compromises.

A good shard key minimizes cross-shard operations and distributes load evenly under real traffic.

What a Shard Key Must Achieve

  • Distribution: avoid hotspots; spread reads and writes
  • Locality: keep related data together to avoid cross-shard joins
  • Predictable routing: application can compute shard quickly
  • Stability: shard key should not change (immutability)

Cardinality: High Enough to Distribute

Shard keys with low cardinality create hotspots.

  • Bad: status (ACTIVE/INACTIVE)
  • Bad: country (few values, skewed)
  • Bad: plan_type (FREE/PRO)

Even if you hash these values, skew remains because traffic is not uniform.

Skew: Real Traffic Is Not Uniform

Distribution is not about number of distinct values only. It is about how frequently each value is accessed.

Example: if 60 percent of traffic is for one tenant, tenant_id is a hotspot key unless you do special handling for large tenants.

Locality: Keep What You Join Together

If your application frequently joins or queries related rows together, try to keep them on the same shard.

Common locality-friendly keys:

  • tenant_id for SaaS multi-tenant systems
  • user_id for user-centric systems

But locality can conflict with distribution. A few massive tenants can break tenant_id sharding.

Immutability: Keys That Change Will Break You

A shard key should be stable. If it changes, the row must move shards, which is expensive and correctness-sensitive.

  • Bad key example: user_email (can change)
  • Bad key example: region (user moves region, geo routing changes)

If a key must change, design a stable surrogate key for sharding (e.g., user_id) and treat mutable attributes as normal columns.

Query Pattern Alignment

The shard key should match your highest-QPS access patterns.

Ask:

  • What is the hottest endpoint?
  • What parameters does it use to fetch data?
  • Can those parameters route to a single shard?

If your hottest endpoint becomes scatter-gather across shards, your system will fail under scale.

Hotspot Keys and “Big Tenant” Strategy

In multi-tenant systems, tenant_id is common. But a few large tenants often dominate traffic.

Mitigation strategies:

  • Isolate large tenants into dedicated shards
  • Use sub-sharding within a tenant (tenant_id + hash(user_id))
  • Split by functional domains (billing vs events)

Do not pretend all tenants are equal; they are not.

Range vs Hash Considerations

Choosing the key is one step; choosing range vs hash is another.

  • If you need ordered queries by key, range may help but risks monotonic hotspots.
  • If you need uniform distribution, hash helps but breaks range queries.

A common compromise is stable key + hashing + secondary indexing per shard.

Global Uniqueness Constraints

Sharding breaks global uniqueness unless your key strategy embeds uniqueness.

Examples:

  • Auto-increment ids cannot be global without coordination.
  • Unique usernames across shards require central registry or coordination.

Production patterns:

  • Use UUID/ULID-like ids
  • Use a dedicated “registry” service/table for globally unique claims
  • Namespace uniqueness by tenant when possible

Failure Modes in Production

  • Hot shard: shard key correlates with traffic spikes.
  • Scatter-gather overload: hottest queries require querying all shards.
  • Resharding necessity: key chosen cannot support growth.
  • Mutable key migration: frequent shard moves cause correctness bugs.
  • Big tenant collapse: one tenant saturates a shard.
  • Global uniqueness bugs: duplicates appear across shards.

Operational Checklist

  • List top read and write endpoints by QPS and latency budget.
  • Choose a shard key that routes hot endpoints to a single shard.
  • Verify shard key has high cardinality and low traffic skew.
  • Test key distribution with realistic production-like traffic.
  • Ensure shard key is immutable; do not shard by mutable attributes.
  • Plan strategy for big tenants/hot keys (dedicated shard or sub-sharding).
  • Decide how you will handle global uniqueness and id generation.
  • Decide how analytics and cross-shard aggregates will be computed (offline pipeline).
  • Build routing observability: log shard id per request.
  • Document the key decision and the expected failure modes.

Summary

Shard key choice determines routing, locality, hotspots, and future migration cost. A good key is stable, aligns with hottest query patterns, distributes load under skewed traffic, and supports your uniqueness requirements. Treat shard key design as a top-level architecture decision, not an implementation detail.