Metrics Basics

Expose minimal but meaningful metrics: request count, error count, and latency histograms with safe label choices. Avoid high-cardinality labels and treat metrics as a long-term contract for dashboards and alerts.

On this page

Why metrics are different from logs

Logs tell you what happened in a specific request. Metrics tell you what is happening across all requests. In production, metrics are how you detect incidents before users report them. They power dashboards, alerts, and SLO tracking. A good metrics setup is small, stable, and intentionally labeled.

Production goals at this level

Measure throughput: how many requests per route and status code.
Measure errors: how many failures by type.
Measure latency: how long requests and key operations take.
Keep labels safe: avoid high-cardinality fields like request id.

What to measure first

Start with three core metric types:

Counter: total number of requests and errors.
Histogram: latency distribution in milliseconds or seconds.
Gauge: occasionally for in-flight requests or pool usage.

You do not need dozens of metrics. You need a small, reliable baseline.

Cardinality: the hidden production risk

Cardinality is the number of unique label combinations. High-cardinality metrics explode memory usage in monitoring systems. Never use these as labels:

request_id
user_id
email or any unique identifier
raw error messages

Good label examples:

route
method
status_code
error_type (small fixed set)

Minimal Prometheus setup

One common approach is using the prometheus crate and exposing a /metrics endpoint. Keep it simple and explicit.

Dependencies

# Cargo.toml
[dependencies]
prometheus = "0.13"
once_cell = "1"

Define metrics statically

Define a few global metrics with stable names. Names are part of your contract. Changing them breaks dashboards.

use once_cell::sync::Lazy;
use prometheus::{Encoder, HistogramVec, IntCounterVec, TextEncoder, register_histogram_vec, register_int_counter_vec};

pub static HTTP_REQUESTS_TOTAL: Lazy<IntCounterVec> = Lazy::new(|| {
    register_int_counter_vec!(
        "http_requests_total",
        "Total number of HTTP requests",
        &["method", "route", "status"]
    ).unwrap()
});

pub static HTTP_REQUEST_DURATION: Lazy<HistogramVec> = Lazy::new(|| {
    register_histogram_vec!(
        "http_request_duration_seconds",
        "HTTP request latency in seconds",
        &["method", "route"]
    ).unwrap()
});

Instrumenting a handler

Record count and latency at the request boundary. Keep label values bounded and normalized (use route pattern, not full path).

use std::time::Instant;
use axum::{routing::get, Router};

async fn hello() -> String {
    "hello".to_string()
}

pub async fn hello_instrumented() -> String {
    let start = Instant::now();

    let response = hello().await;

    let elapsed = start.elapsed().as_secs_f64();

    HTTP_REQUESTS_TOTAL
        .with_label_values(&["GET", "/hello", "200"])
        .inc();

    HTTP_REQUEST_DURATION
        .with_label_values(&["GET", "/hello"])
        .observe(elapsed);

    response
}

Expose metrics endpoint

Prometheus scrapes metrics from an HTTP endpoint. This endpoint should not depend on your main business logic.

use axum::{response::IntoResponse, routing::get, Router};

async fn metrics_handler() -> impl IntoResponse {
    let encoder = TextEncoder::new();
    let metric_families = prometheus::gather();
    let mut buffer = Vec::new();
    encoder.encode(&metric_families, &mut buffer).unwrap();
    String::from_utf8(buffer).unwrap()
}

pub fn with_metrics(app: Router) -> Router {
    app.route("/metrics", get(metrics_handler))
}

Error metrics

Track error counts separately using a small error_type label set. This allows dashboards to show trends in validation errors versus internal errors.

pub static HTTP_ERRORS_TOTAL: Lazy<IntCounterVec> = Lazy::new(|| {
    register_int_counter_vec!(
        "http_errors_total",
        "Total number of HTTP errors",
        &["route", "error_type"]
    ).unwrap()
});

// Example usage
HTTP_ERRORS_TOTAL
    .with_label_values(&["/users", "internal"])
    .inc();

Latency histograms and SLO thinking

Histograms allow you to compute percentiles such as p95 or p99. In production, you often care more about tail latency than average latency. Even at this stage, design your histogram buckets thoughtfully and keep them consistent over time.

Operational checklist

/metrics endpoint responds and can be scraped.
Request counter increases for every request.
Error counter increases only for failed responses.
Latency histogram shows realistic distribution under load.
No high-cardinality labels exist.

How metrics and tracing complement each other

Metrics show that error rate increased from 1 percent to 5 percent. Tracing shows which specific requests failed and why. Logs show the exact error details. Production observability requires all three working together.

What comes next

Next we will formalize health checks and readiness signals in a more operational way, separating liveness from dependency health and integrating them with observability signals.

← Tracing Basics

Health Checks and Readiness →