Tracing Basics
Why tracing is not just logging
Traditional logging is a stream of lines. In production incidents, a stream is hard to reason about: which lines belong to the same request, where time was spent, what the service was doing when it failed. Tracing adds structure: you create a span for an operation, attach fields (request id, route, user id), and emit events inside the span. The result is a timeline that you can filter and follow.
Production goals at this level
- Consistent correlation fields: request id and route appear on relevant events.
- Spans for boundaries: one span per request, and spans for major steps like database calls.
- Minimal overhead: keep it simple and stable before adding deep tracing everywhere.
- Actionable logs: logs answer: what failed, where, and how long it took.
Mental model: spans and events
- Span: represents an operation that takes time (request handling, database query, calling a downstream).
- Event: a point-in-time record inside a span (validation failed, query started, query succeeded).
- Fields: key-value metadata attached to spans and events (request_id, user_id, op, status).
In production, fields are the difference between searching blindly and filtering precisely.
Dependencies
Use tracing and tracing-subscriber. For HTTP servers, tower-http TraceLayer gives a good request baseline.
# Cargo.toml (core observability pieces)
[dependencies]
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
tower-http = { version = "0.5", features = ["trace"] }
Initialize tracing with an environment filter
Production-friendly tracing starts with environment-driven verbosity. This allows safe changes during incidents without code changes.
use tracing_subscriber::EnvFilter;
pub fn init_tracing() {
tracing_subscriber::fmt()
.with_env_filter(EnvFilter::from_default_env())
.with_target(false)
.with_level(true)
.init();
}
Baseline request tracing with TraceLayer
TraceLayer generates per-request spans and logs request lifecycle information. This is a strong baseline for HTTP services.
use axum::Router;
use tower_http::trace::TraceLayer;
pub fn with_http_tracing(app: Router) -> Router {
app.layer(TraceLayer::new_for_http())
}
Adding your own spans for important operations
TraceLayer covers request boundaries, but production debugging often needs spans around critical steps: database queries, cache lookups, external calls. Create spans that include stable fields and capture duration naturally.
use tracing::{info_span, Instrument};
pub async fn load_user(pool: &sqlx::MySqlPool, user_id: i64, request_id: &str)
-> Result<Option<(i64, String)>, sqlx::Error>
{
let span = info_span!(
"db.load_user",
request_id = %request_id,
user_id = %user_id
);
async move {
tracing::info!(op = "start", "query begin");
let row = sqlx::query_as("SELECT id, name FROM users WHERE id = ?")
.bind(user_id)
.fetch_optional(pool)
.await?;
tracing::info!(op = "end", found = %row.is_some(), "query end");
Ok(row)
}
.instrument(span)
.await
}
Production mindset: choose fields carefully
Not every value should be a field. Good production fields are stable and useful for filtering:
- Good: request_id, route, method, status, op name, user_id (if not sensitive), error type.
- Avoid: large payloads, full SQL statements, secrets, emails, tokens.
Keep sensitive data out of logs by default. If you need to debug a specific issue, add controlled debug logs temporarily and remove them after.
Errors and tracing: log once, with context
A common production anti-pattern is logging the same error multiple times across layers. Prefer a single, high-context error log at the boundary where you decide how to respond. Lower layers can return errors without logging, or log only when they add meaningful context.
Example: mapping DB error to API error with tracing
pub async fn get_user_handler(
state: axum::extract::State<crate::AppState>,
axum::extract::Path(user_id): axum::extract::Path<i64>,
request_id: String,
) -> Result<axum::Json<crate::UserDto>, crate::errors::ApiError> {
let span = tracing::info_span!(
"handler.get_user",
request_id = %request_id,
user_id = %user_id
);
async move {
let row = crate::db::load_user(&state.pool, user_id, &request_id).await
.map_err(|e| {
tracing::warn!(err = ?e, "db error");
crate::errors::ApiError::Internal { context: "db error".to_string() }
})?;
match row {
Some((id, name)) => Ok(axum::Json(crate::UserDto { id, name })),
None => Err(crate::errors::ApiError::NotFound { message: "user not found".to_string() }),
}
}
.instrument(span)
.await
}
Operational usage: how you actually use tracing in incidents
During an incident, you usually do not start from code. You start from symptoms:
- Spike in errors: filter logs by status and route, then pick a request id.
- Latency increase: look for spans with long duration, then inspect nested spans like db.load_user.
- Intermittent failures: group by error type and compare timings.
The point is to make diagnosis a filtering task, not a guessing task.
Practical verification
Before moving on, confirm these basics locally:
- Setting RUST_LOG changes verbosity without code changes.
- Each request produces a trace log line with method and status.
- Critical operations add spans and events with request id.
RUST_LOG=info cargo run RUST_LOG=tower_http=debug,info cargo run
What comes next
Tracing tells you what happened in a specific request. Metrics tell you what is happening across all requests. Next we will add minimal metrics: counters, histograms, and a safe labeling strategy.