DATABASE-ADVANCED Contents

Data Modeling Checklist

A production checklist for constraints, indexes, migrations, and long-term evolution.

On this page

Data Modeling Checklist: Production-Grade Schema Design

At scale, data modeling is an operational discipline. A schema is not only a set of tables; it is a long-term contract that affects performance, correctness, migrations, and incident response.

Use this checklist before shipping new tables and before major schema changes.

Access Patterns and Query Shape

  • List the top queries the model must serve (by endpoint/use case).
  • Identify filter + sort patterns; design composite indexes accordingly.
  • Decide pagination strategy (cursor vs offset); avoid offset for large datasets.
  • Identify expected hot keys/hot ranges (recent time windows, popular tenants).
  • Define maximum result sizes and enforce limits at API level.

Correctness and Constraints

  • Define source of truth per entity (which table is canonical).
  • Decide which invariants must be enforced by the DB (unique, foreign key, check constraints).
  • Define write semantics under concurrency (idempotency keys, conflict handling).
  • Decide how to handle multi-step workflows (transaction boundaries).

Normalization vs Denormalization

  • Start normalized; denormalize only when metrics justify.
  • For each denormalized field, document update strategy and ownership.
  • Ensure denormalized models are rebuildable from canonical sources.
  • Measure write amplification and lock contention impact.

Growth, Retention, and Time

  • Estimate growth: rows/day, bytes/row, index overhead.
  • Define retention policy (drop/archive/summarize) before launch.
  • For time-series or high-volume tables, plan partitioning from day one.
  • Plan backfills and late-arriving data handling if time is involved.

Multi-Tenancy and Isolation

  • Define tenancy model: shared tables vs dedicated DB vs hybrid.
  • Enforce tenant scoping in the access layer; disallow unscoped queries.
  • Design indexes with tenant_id as leading column for hot paths.
  • Define per-tenant quotas and monitor per-tenant metrics.
  • Plan big-tenant graduation path (shared → dedicated) early.

Soft Delete and Lifecycle

  • Decide if soft delete is required; avoid for high-churn hot tables unless necessary.
  • Define uniqueness semantics for deleted rows (reuse allowed or not).
  • Enforce query hygiene (views/scopes) to prevent deleted data leakage.
  • Define retention and purge strategy; prefer partition drop over row deletes at scale.
  • Ensure privacy requirements are met (true erasure/anonymization when required).

Auditability and History

  • Define which actions must be audited (create/update/delete/access/export).
  • Ensure audit is append-only and access-controlled.
  • Store actor identity, timestamps, request/trace id, and change payload version.
  • Partition audit tables by time; define retention and archival.
  • Make audit queries usable during incidents (index for entity_id, actor_id, time windows).

Migrations and Evolution

  • Use expand/contract for non-trivial schema changes.
  • Design backfills to be chunked, throttled, idempotent, and resumable.
  • Plan compatibility for multi-service rollouts (partial deployment states).
  • Define rollback strategy before executing migration.
  • Verify migration correctness with counts, checksums, and invariants.

Performance and Operational Safety

  • Keep indexes intentional; every index is a write tax.
  • Plan for large table operations: vacuum/purge, bloat, checkpoint behavior.
  • Define timeouts and resource limits for expensive queries.
  • Consider read replicas and caching strategy for read-heavy endpoints.
  • Protect the primary DB with rate limits and circuit breakers for fallback paths.

Observability and Debuggability

  • Log/trace query paths for critical endpoints (without leaking sensitive data).
  • Monitor top queries, p95/p99 latency, lock waits, and replication lag.
  • Track growth metrics: table size, index size, bloat indicators.
  • For caches: monitor hit rate by namespace, fallback-to-DB QPS, evictions, hot keys.
  • Establish SLOs for critical read/write paths and alert on regression.

Security and Data Exposure

  • Minimize sensitive data storage; never store secrets in DB fields or audit logs.
  • Define access controls and roles; least privilege for application users.
  • Ensure tenant isolation is enforced and tested (no cross-tenant reads).
  • Define data export and deletion procedures for compliance.

Final Gate: Production Readiness Questions

  • Can we explain the model to on-call engineers at 03:00?
  • Can we backfill and migrate without downtime?
  • Can we purge data safely and cheaply?
  • Can we rebuild denormalized projections if they drift?
  • Can we investigate incidents using audit logs and traces?

Summary

Production-grade data modeling requires explicit decisions about access patterns, constraints, growth/retention, isolation, lifecycle, auditability, migration safety, and observability. Use this checklist as a gate to prevent schema choices that create long-term operational debt.