Shard Key Design

Choosing shard keys for uniform distribution, locality, and query compatibility.

On this page

Shard Key Design: The Most Expensive Mistake You Can Make

In a sharded system, the shard key decides where data lives and how requests are routed. If you choose the wrong shard key, you will eventually be forced into painful migrations, cross-shard queries, and correctness compromises.

A good shard key minimizes cross-shard operations and distributes load evenly under real traffic.

What a Shard Key Must Achieve

Distribution: avoid hotspots; spread reads and writes
Locality: keep related data together to avoid cross-shard joins
Predictable routing: application can compute shard quickly
Stability: shard key should not change (immutability)

Cardinality: High Enough to Distribute

Shard keys with low cardinality create hotspots.

Bad: status (ACTIVE/INACTIVE)
Bad: country (few values, skewed)
Bad: plan_type (FREE/PRO)

Even if you hash these values, skew remains because traffic is not uniform.

Skew: Real Traffic Is Not Uniform

Distribution is not about number of distinct values only. It is about how frequently each value is accessed.

Example: if 60 percent of traffic is for one tenant, tenant_id is a hotspot key unless you do special handling for large tenants.

Locality: Keep What You Join Together

If your application frequently joins or queries related rows together, try to keep them on the same shard.

Common locality-friendly keys:

tenant_id for SaaS multi-tenant systems
user_id for user-centric systems

But locality can conflict with distribution. A few massive tenants can break tenant_id sharding.

Immutability: Keys That Change Will Break You

A shard key should be stable. If it changes, the row must move shards, which is expensive and correctness-sensitive.

Bad key example: user_email (can change)
Bad key example: region (user moves region, geo routing changes)

If a key must change, design a stable surrogate key for sharding (e.g., user_id) and treat mutable attributes as normal columns.

Query Pattern Alignment

The shard key should match your highest-QPS access patterns.

Ask:

What is the hottest endpoint?
What parameters does it use to fetch data?
Can those parameters route to a single shard?

If your hottest endpoint becomes scatter-gather across shards, your system will fail under scale.

Hotspot Keys and “Big Tenant” Strategy

In multi-tenant systems, tenant_id is common. But a few large tenants often dominate traffic.

Mitigation strategies:

Isolate large tenants into dedicated shards
Use sub-sharding within a tenant (tenant_id + hash(user_id))
Split by functional domains (billing vs events)

Do not pretend all tenants are equal; they are not.

Range vs Hash Considerations

Choosing the key is one step; choosing range vs hash is another.

If you need ordered queries by key, range may help but risks monotonic hotspots.
If you need uniform distribution, hash helps but breaks range queries.

A common compromise is stable key + hashing + secondary indexing per shard.

Global Uniqueness Constraints

Sharding breaks global uniqueness unless your key strategy embeds uniqueness.

Examples:

Auto-increment ids cannot be global without coordination.
Unique usernames across shards require central registry or coordination.

Production patterns:

Use UUID/ULID-like ids
Use a dedicated “registry” service/table for globally unique claims
Namespace uniqueness by tenant when possible

Failure Modes in Production

Hot shard: shard key correlates with traffic spikes.
Scatter-gather overload: hottest queries require querying all shards.
Resharding necessity: key chosen cannot support growth.
Mutable key migration: frequent shard moves cause correctness bugs.
Big tenant collapse: one tenant saturates a shard.
Global uniqueness bugs: duplicates appear across shards.

Operational Checklist

List top read and write endpoints by QPS and latency budget.
Choose a shard key that routes hot endpoints to a single shard.
Verify shard key has high cardinality and low traffic skew.
Test key distribution with realistic production-like traffic.
Ensure shard key is immutable; do not shard by mutable attributes.
Plan strategy for big tenants/hot keys (dedicated shard or sub-sharding).
Decide how you will handle global uniqueness and id generation.
Decide how analytics and cross-shard aggregates will be computed (offline pipeline).
Build routing observability: log shard id per request.
Document the key decision and the expected failure modes.

Summary

Shard key choice determines routing, locality, hotspots, and future migration cost. A good key is stable, aligns with hottest query patterns, distributes load under skewed traffic, and supports your uniqueness requirements. Treat shard key design as a top-level architecture decision, not an implementation detail.

← Sharding Patterns

Cross-Shard Queries →