Data Modeling Checklist

A production checklist for constraints, indexes, migrations, and long-term evolution.

On this page

Data Modeling Checklist: Production-Grade Schema Design

At scale, data modeling is an operational discipline. A schema is not only a set of tables; it is a long-term contract that affects performance, correctness, migrations, and incident response.

Use this checklist before shipping new tables and before major schema changes.

Access Patterns and Query Shape

List the top queries the model must serve (by endpoint/use case).
Identify filter + sort patterns; design composite indexes accordingly.
Decide pagination strategy (cursor vs offset); avoid offset for large datasets.
Identify expected hot keys/hot ranges (recent time windows, popular tenants).
Define maximum result sizes and enforce limits at API level.

Correctness and Constraints

Define source of truth per entity (which table is canonical).
Decide which invariants must be enforced by the DB (unique, foreign key, check constraints).
Define write semantics under concurrency (idempotency keys, conflict handling).
Decide how to handle multi-step workflows (transaction boundaries).

Normalization vs Denormalization

Start normalized; denormalize only when metrics justify.
For each denormalized field, document update strategy and ownership.
Ensure denormalized models are rebuildable from canonical sources.
Measure write amplification and lock contention impact.

Growth, Retention, and Time

Estimate growth: rows/day, bytes/row, index overhead.
Define retention policy (drop/archive/summarize) before launch.
For time-series or high-volume tables, plan partitioning from day one.
Plan backfills and late-arriving data handling if time is involved.

Multi-Tenancy and Isolation

Define tenancy model: shared tables vs dedicated DB vs hybrid.
Enforce tenant scoping in the access layer; disallow unscoped queries.
Design indexes with tenant_id as leading column for hot paths.
Define per-tenant quotas and monitor per-tenant metrics.
Plan big-tenant graduation path (shared → dedicated) early.

Soft Delete and Lifecycle

Decide if soft delete is required; avoid for high-churn hot tables unless necessary.
Define uniqueness semantics for deleted rows (reuse allowed or not).
Enforce query hygiene (views/scopes) to prevent deleted data leakage.
Define retention and purge strategy; prefer partition drop over row deletes at scale.
Ensure privacy requirements are met (true erasure/anonymization when required).

Auditability and History

Define which actions must be audited (create/update/delete/access/export).
Ensure audit is append-only and access-controlled.
Store actor identity, timestamps, request/trace id, and change payload version.
Partition audit tables by time; define retention and archival.
Make audit queries usable during incidents (index for entity_id, actor_id, time windows).

Migrations and Evolution

Use expand/contract for non-trivial schema changes.
Design backfills to be chunked, throttled, idempotent, and resumable.
Plan compatibility for multi-service rollouts (partial deployment states).
Define rollback strategy before executing migration.
Verify migration correctness with counts, checksums, and invariants.

Performance and Operational Safety

Keep indexes intentional; every index is a write tax.
Plan for large table operations: vacuum/purge, bloat, checkpoint behavior.
Define timeouts and resource limits for expensive queries.
Consider read replicas and caching strategy for read-heavy endpoints.
Protect the primary DB with rate limits and circuit breakers for fallback paths.

Observability and Debuggability

Log/trace query paths for critical endpoints (without leaking sensitive data).
Monitor top queries, p95/p99 latency, lock waits, and replication lag.
Track growth metrics: table size, index size, bloat indicators.
For caches: monitor hit rate by namespace, fallback-to-DB QPS, evictions, hot keys.
Establish SLOs for critical read/write paths and alert on regression.

Security and Data Exposure

Minimize sensitive data storage; never store secrets in DB fields or audit logs.
Define access controls and roles; least privilege for application users.
Ensure tenant isolation is enforced and tested (no cross-tenant reads).
Define data export and deletion procedures for compliance.

Final Gate: Production Readiness Questions

Can we explain the model to on-call engineers at 03:00?
Can we backfill and migrate without downtime?
Can we purge data safely and cheaply?
Can we rebuild denormalized projections if they drift?
Can we investigate incidents using audit logs and traces?

Summary

Production-grade data modeling requires explicit decisions about access patterns, constraints, growth/retention, isolation, lifecycle, auditability, migration safety, and observability. Use this checklist as a gate to prevent schema choices that create long-term operational debt.

← Schema Migration Strategies