JFR/async-profiler basics (conceptual)

Use sampling profilers like JFR and async-profiler to identify CPU, allocation, and lock bottlenecks before optimizing, and avoid guessing performance issues in production systems.

On this page

Never optimize without measuring

Performance tuning without profiling is guessing. In production systems, guessing often makes things worse. Profiling allows you to identify where time and memory are actually spent.

Types of profiling

CPU profiling: where execution time is spent.
Allocation profiling: where objects are created.
Lock profiling: monitor contention and blocking.
GC profiling: pause time and allocation pressure.

Sampling vs instrumentation

Sampling: periodically samples stack traces. Low overhead, good for production.
Instrumentation: inserts measurement code. More accurate but higher overhead.

In production, sampling is usually preferred.

Java Flight Recorder (JFR)

JFR is a low-overhead profiling tool built into the JVM. It records events such as:

Method profiling samples
GC pauses
Thread states
Lock contention

It is designed to be safe in production with minimal performance impact.

Typical JFR use case

Latency spike investigation
Memory leak suspicion
Thread contention analysis

async-profiler (conceptual)

async-profiler is a sampling profiler that uses OS-level capabilities (e.g., perf events) to capture CPU and allocation data with low overhead.

It can generate flame graphs for visual analysis.

What is a flame graph?

A flame graph visualizes stack traces aggregated by sample frequency.

Width = time spent in function.
Height = call stack depth.

The widest frames at the top often indicate bottlenecks.

Reading flame graphs

Look for wide blocks (high CPU cost).
Check if cost is in your code or libraries.
Identify unexpected hot paths.

Common profiling surprises

String concatenation dominating CPU.
JSON serialization allocating heavily.
Logging inside tight loops.
Lock contention dominating runtime.

Allocation profiling insight

High allocation rate may not appear as CPU bottleneck but leads to frequent GC.

Allocation flame graphs reveal object creation hotspots.

Lock contention profiling

JFR and other profilers can show monitor enter time and blocked threads. If many threads wait on the same monitor, throughput collapses even if CPU usage seems low.

Production-safe profiling guidelines

Prefer sampling profilers.
Avoid heavy instrumentation in production.
Profile during realistic load.
Compare before/after metrics when optimizing.

Profiling workflow

Identify symptom (latency, CPU spike, GC spike).
Capture profile under load.
Analyze top consumers.
Hypothesize root cause.
Fix smallest meaningful hotspot.
Measure again.

Anti-pattern: micro-optimizing cold code

Profiling often shows that only a small fraction of code consumes most resources (Pareto principle). Optimizing non-hot code wastes effort.

Production incident scenario

A service experiences high latency. Engineers assume DB is slow. Profiling reveals 40 percent CPU spent on JSON pretty-print logging. Fixing logging reduces latency dramatically.

Checklist

Always measure before optimizing.
Use sampling profilers for production.
Read flame graphs carefully; focus on width.
Investigate allocation rate and lock contention.
Validate improvements with repeated measurement.

Final principle

Performance tuning is empirical. Profilers turn assumptions into evidence. Without profiling, optimization is guesswork.

← Heap, GC basics, leak signals