Tracing Basics

Use tracing to turn request handling into a timeline: spans for operations, events for key moments, and consistent fields like request id and route. The goal is fast incident triage and root cause analysis without adding fragile manual log lines everywher

On this page

Why tracing is not just logging

Traditional logging is a stream of lines. In production incidents, a stream is hard to reason about: which lines belong to the same request, where time was spent, what the service was doing when it failed. Tracing adds structure: you create a span for an operation, attach fields (request id, route, user id), and emit events inside the span. The result is a timeline that you can filter and follow.

Production goals at this level

Consistent correlation fields: request id and route appear on relevant events.
Spans for boundaries: one span per request, and spans for major steps like database calls.
Minimal overhead: keep it simple and stable before adding deep tracing everywhere.
Actionable logs: logs answer: what failed, where, and how long it took.

Mental model: spans and events

Span: represents an operation that takes time (request handling, database query, calling a downstream).
Event: a point-in-time record inside a span (validation failed, query started, query succeeded).
Fields: key-value metadata attached to spans and events (request_id, user_id, op, status).

In production, fields are the difference between searching blindly and filtering precisely.

Dependencies

Use tracing and tracing-subscriber. For HTTP servers, tower-http TraceLayer gives a good request baseline.

# Cargo.toml (core observability pieces)
[dependencies]
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
tower-http = { version = "0.5", features = ["trace"] }

Initialize tracing with an environment filter

Production-friendly tracing starts with environment-driven verbosity. This allows safe changes during incidents without code changes.

use tracing_subscriber::EnvFilter;

pub fn init_tracing() {
    tracing_subscriber::fmt()
        .with_env_filter(EnvFilter::from_default_env())
        .with_target(false)
        .with_level(true)
        .init();
}

Baseline request tracing with TraceLayer

TraceLayer generates per-request spans and logs request lifecycle information. This is a strong baseline for HTTP services.

use axum::Router;
use tower_http::trace::TraceLayer;

pub fn with_http_tracing(app: Router) -> Router {
    app.layer(TraceLayer::new_for_http())
}

Adding your own spans for important operations

TraceLayer covers request boundaries, but production debugging often needs spans around critical steps: database queries, cache lookups, external calls. Create spans that include stable fields and capture duration naturally.

use tracing::{info_span, Instrument};

pub async fn load_user(pool: &sqlx::MySqlPool, user_id: i64, request_id: &str)
    -> Result<Option<(i64, String)>, sqlx::Error>
{
    let span = info_span!(
        "db.load_user",
        request_id = %request_id,
        user_id = %user_id
    );

    async move {
        tracing::info!(op = "start", "query begin");

        let row = sqlx::query_as("SELECT id, name FROM users WHERE id = ?")
            .bind(user_id)
            .fetch_optional(pool)
            .await?;

        tracing::info!(op = "end", found = %row.is_some(), "query end");
        Ok(row)
    }
    .instrument(span)
    .await
}

Production mindset: choose fields carefully

Not every value should be a field. Good production fields are stable and useful for filtering:

Good: request_id, route, method, status, op name, user_id (if not sensitive), error type.
Avoid: large payloads, full SQL statements, secrets, emails, tokens.

Keep sensitive data out of logs by default. If you need to debug a specific issue, add controlled debug logs temporarily and remove them after.

Errors and tracing: log once, with context

A common production anti-pattern is logging the same error multiple times across layers. Prefer a single, high-context error log at the boundary where you decide how to respond. Lower layers can return errors without logging, or log only when they add meaningful context.

Example: mapping DB error to API error with tracing

pub async fn get_user_handler(
    state: axum::extract::State<crate::AppState>,
    axum::extract::Path(user_id): axum::extract::Path<i64>,
    request_id: String,
) -> Result<axum::Json<crate::UserDto>, crate::errors::ApiError> {

    let span = tracing::info_span!(
        "handler.get_user",
        request_id = %request_id,
        user_id = %user_id
    );
    async move {
        let row = crate::db::load_user(&state.pool, user_id, &request_id).await
            .map_err(|e| {
                tracing::warn!(err = ?e, "db error");
                crate::errors::ApiError::Internal { context: "db error".to_string() }
            })?;

        match row {
            Some((id, name)) => Ok(axum::Json(crate::UserDto { id, name })),
            None => Err(crate::errors::ApiError::NotFound { message: "user not found".to_string() }),
        }
    }
    .instrument(span)
    .await
}

Operational usage: how you actually use tracing in incidents

During an incident, you usually do not start from code. You start from symptoms:

Spike in errors: filter logs by status and route, then pick a request id.
Latency increase: look for spans with long duration, then inspect nested spans like db.load_user.
Intermittent failures: group by error type and compare timings.

The point is to make diagnosis a filtering task, not a guessing task.

Practical verification

Before moving on, confirm these basics locally:

Setting RUST_LOG changes verbosity without code changes.
Each request produces a trace log line with method and status.
Critical operations add spans and events with request id.

RUST_LOG=info cargo run
RUST_LOG=tower_http=debug,info cargo run

What comes next

Tracing tells you what happened in a specific request. Metrics tell you what is happening across all requests. Next we will add minimal metrics: counters, histograms, and a safe labeling strategy.

Metrics Basics →