Go Production Checklist

A production checklist for Go services covering reliability, security, performance, observability, deployments, and data safety. Use this as a pre-launch and post-incident standard.

On this page

Go Production Checklist: What Must Be True Before Go Live

This checklist is designed for real production environments: containerized deployments, rolling updates, external dependencies, and incident-driven operations. It is intentionally pragmatic. If you cannot confidently check an item, it is a risk. The goal is not perfection, but predictable behavior under load, failure, and change.

Use this checklist for:

Pre launch readiness review
Deployment safety verification
Post incident hardening
New service template standards

1. Service Basics and Contracts

Service name, version, and environment are exposed in logs and health summary
Public API contract documented and backward compatibility rules defined
Input validation strict: unknown fields rejected, body size limited, content type validated
Error responses are consistent and do not leak internal details
Idempotency rules documented for write endpoints that may be retried

2. Configuration and Secrets

All configuration is externalized via env vars or mounted config files
Config validated at startup and service fails fast on invalid config
Secrets are not embedded in images or logs
Secrets rotated strategy defined and tested
Default settings are safe and conservative

3. Timeouts, Retries, and Backpressure

HTTP server read timeout, write timeout, and idle timeout set
All outbound HTTP calls have timeouts and context propagation
Retries are limited and use exponential backoff with jitter
Retry only idempotent operations or use idempotency keys
Backpressure exists: concurrency limits for heavy endpoints or queues
Circuit breaker or failure containment strategy exists for unstable dependencies

4. Database Safety

All SQL queries are parameterized and safe from injection
database/sql pool configured: MaxOpenConns, MaxIdleConns, ConnMaxLifetime
Context used for all DB calls and timeouts enforced
Rows are always closed and rows.Err checked
Transactions are short and boundaries are explicit
Deadlock and serialization failure retry strategy defined
Least privilege DB user used, no admin permissions in app runtime

5. Migrations and Schema Evolution

Migrations are backward compatible for rolling deploy
No destructive schema changes during mixed version rollout
Backfills are batched and do not lock tables for long periods
Migration execution strategy is defined and controlled
Rollback plan exists and is practiced

6. HTTP and API Hardening

Request body size limited
Rate limiting strategy exists for public endpoints
CORS configured intentionally where applicable
Secure headers set when serving web traffic
Authentication and authorization rules enforced consistently
Logging does not include tokens, cookies, or credentials

7. Observability: Logs, Metrics, Traces

Structured logs with consistent fields
request_id present in every request log
Metrics exposed on /metrics and scraped successfully
RED metrics per route: rate, errors, duration
Dependency metrics: DB latency, external API latency, cache hit rate
Saturation metrics: db pool wait, queue depth, goroutines, heap, GC
Dashboards exist for service overview and dependency health
Alerts are tied to user impact and SLOs where possible
Tracing propagation exists for critical flows when needed

8. Performance and Resource Control

pprof available on protected admin port for incident debugging
Known hot paths benchmarked and allocation metrics tracked
Payload sizes controlled via pagination and field selection
Concurrency limits align with DB pool and downstream capacity
No unbounded buffering of large requests or responses
GC and heap behavior monitored for regressions

9. Graceful Shutdown and Health Endpoints

SIGTERM handled
Graceful shutdown drains in flight requests
Readiness fails immediately on shutdown start
Liveness endpoint is cheap and dependency free
Readiness reflects ability to serve traffic and uses timeouts
Startup probe strategy exists for slow initialization

10. Deployment Safety

Rolling update strategy preserves capacity
Canary release path exists for risky changes
Dependency storms controlled via pool limits and rollout pacing
Deploy is observable with dashboard annotations and version tags
Rollback path documented and tested

11. Container and Runtime Security

Image built with multi stage build
Runtime image minimal and includes needed CA certs
Runs as non-root when possible
Ports and network exposure minimized
File system permissions and volumes defined intentionally
Security updates and base image refresh policy defined

12. Operations and Incident Readiness

Runbook exists for common incidents: high latency, db pool saturation, dependency outage
Known failure modes documented
On call response steps defined: triage flow using metrics then profiles
Post incident review process exists and produces concrete follow ups
Chaos or failure injection tests planned for critical dependencies

13. Data Protection

Backups exist and restore drills performed
PII handling rules defined and enforced
Audit logging exists for sensitive operations if required
Data retention policy defined and applied

Pre Launch Quick Gate

If you only have time for a minimal gate before go live, confirm these:

Readiness and graceful shutdown proven in a rollout test
DB pool configured and monitored, no connection leaks
Request validation strict and body size limited
Metrics: per route latency and error rates visible
Alerts: user impact alerts active
Migrations strategy safe for rolling deploy

Final Perspective

A Go service becomes production ready when its failure behavior is predictable and observable. This checklist reduces surprise. Use it as a living standard: every incident should add one improvement, one metric, or one guardrail. Over time, production becomes boring. That is the goal.

← Zero-Downtime Deploy Patterns