Go Production Checklist
On this page
Go Production Checklist: What Must Be True Before Go Live
This checklist is designed for real production environments: containerized deployments, rolling updates, external dependencies, and incident-driven operations. It is intentionally pragmatic. If you cannot confidently check an item, it is a risk. The goal is not perfection, but predictable behavior under load, failure, and change.
Use this checklist for:
- Pre launch readiness review
- Deployment safety verification
- Post incident hardening
- New service template standards
1. Service Basics and Contracts
- Service name, version, and environment are exposed in logs and health summary
- Public API contract documented and backward compatibility rules defined
- Input validation strict: unknown fields rejected, body size limited, content type validated
- Error responses are consistent and do not leak internal details
- Idempotency rules documented for write endpoints that may be retried
2. Configuration and Secrets
- All configuration is externalized via env vars or mounted config files
- Config validated at startup and service fails fast on invalid config
- Secrets are not embedded in images or logs
- Secrets rotated strategy defined and tested
- Default settings are safe and conservative
3. Timeouts, Retries, and Backpressure
- HTTP server read timeout, write timeout, and idle timeout set
- All outbound HTTP calls have timeouts and context propagation
- Retries are limited and use exponential backoff with jitter
- Retry only idempotent operations or use idempotency keys
- Backpressure exists: concurrency limits for heavy endpoints or queues
- Circuit breaker or failure containment strategy exists for unstable dependencies
4. Database Safety
- All SQL queries are parameterized and safe from injection
- database/sql pool configured: MaxOpenConns, MaxIdleConns, ConnMaxLifetime
- Context used for all DB calls and timeouts enforced
- Rows are always closed and rows.Err checked
- Transactions are short and boundaries are explicit
- Deadlock and serialization failure retry strategy defined
- Least privilege DB user used, no admin permissions in app runtime
5. Migrations and Schema Evolution
- Migrations are backward compatible for rolling deploy
- No destructive schema changes during mixed version rollout
- Backfills are batched and do not lock tables for long periods
- Migration execution strategy is defined and controlled
- Rollback plan exists and is practiced
6. HTTP and API Hardening
- Request body size limited
- Rate limiting strategy exists for public endpoints
- CORS configured intentionally where applicable
- Secure headers set when serving web traffic
- Authentication and authorization rules enforced consistently
- Logging does not include tokens, cookies, or credentials
7. Observability: Logs, Metrics, Traces
- Structured logs with consistent fields
- request_id present in every request log
- Metrics exposed on /metrics and scraped successfully
- RED metrics per route: rate, errors, duration
- Dependency metrics: DB latency, external API latency, cache hit rate
- Saturation metrics: db pool wait, queue depth, goroutines, heap, GC
- Dashboards exist for service overview and dependency health
- Alerts are tied to user impact and SLOs where possible
- Tracing propagation exists for critical flows when needed
8. Performance and Resource Control
- pprof available on protected admin port for incident debugging
- Known hot paths benchmarked and allocation metrics tracked
- Payload sizes controlled via pagination and field selection
- Concurrency limits align with DB pool and downstream capacity
- No unbounded buffering of large requests or responses
- GC and heap behavior monitored for regressions
9. Graceful Shutdown and Health Endpoints
- SIGTERM handled
- Graceful shutdown drains in flight requests
- Readiness fails immediately on shutdown start
- Liveness endpoint is cheap and dependency free
- Readiness reflects ability to serve traffic and uses timeouts
- Startup probe strategy exists for slow initialization
10. Deployment Safety
- Rolling update strategy preserves capacity
- Canary release path exists for risky changes
- Dependency storms controlled via pool limits and rollout pacing
- Deploy is observable with dashboard annotations and version tags
- Rollback path documented and tested
11. Container and Runtime Security
- Image built with multi stage build
- Runtime image minimal and includes needed CA certs
- Runs as non-root when possible
- Ports and network exposure minimized
- File system permissions and volumes defined intentionally
- Security updates and base image refresh policy defined
12. Operations and Incident Readiness
- Runbook exists for common incidents: high latency, db pool saturation, dependency outage
- Known failure modes documented
- On call response steps defined: triage flow using metrics then profiles
- Post incident review process exists and produces concrete follow ups
- Chaos or failure injection tests planned for critical dependencies
13. Data Protection
- Backups exist and restore drills performed
- PII handling rules defined and enforced
- Audit logging exists for sensitive operations if required
- Data retention policy defined and applied
Pre Launch Quick Gate
If you only have time for a minimal gate before go live, confirm these:
- Readiness and graceful shutdown proven in a rollout test
- DB pool configured and monitored, no connection leaks
- Request validation strict and body size limited
- Metrics: per route latency and error rates visible
- Alerts: user impact alerts active
- Migrations strategy safe for rolling deploy
Final Perspective
A Go service becomes production ready when its failure behavior is predictable and observable. This checklist reduces surprise. Use it as a living standard: every incident should add one improvement, one metric, or one guardrail. Over time, production becomes boring. That is the goal.