Python Production Checklist
On this page
Pre-Production Checklist
- Structured logging with correlation ids (request_id/trace_id).
- Metrics for error rate, latency p95/p99, saturation (queue depth).
- Timeouts and bounded retries for every external call.
- Health checks: liveness cheap, readiness validates dependencies.
- Resource limits: memory/cpu and backpressure on queues.
- Safe shutdown: stop accepting work, drain/cancel, close resources.
Release Readiness
- Artifacts are immutable (build once, deploy many).
- Rollback plan exists and has been practiced.
- Alerts have runbooks and clear owners.
Failure Modes
- No timeouts: hangs become outages and resource exhaustion.
- No backpressure: memory grows until OOM.
- No correlation: incidents take hours instead of minutes.