Java applications can be notoriously difficult to profile effectively, often leading to assumptions about performance bottlenecks that are simply wrong.
Let’s see perf in action on a simple Java application. Imagine a StressTest class with a loop that performs some basic arithmetic and object creation:
public class StressTest {
public static void main(String[] args) {
long sum = 0;
for (int i = 0; i < 1_000_000_000; i++) {
sum += i;
if (i % 1000 == 0) {
new Object(); // Allocate some objects
}
}
System.out.println("Sum: " + sum);
}
}
We can compile and run this:
javac StressTest.java
java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly StressTest
The -XX:+UnlockDiagnosticVMOptions and -XX:+PrintAssembly flags are crucial here. They enable low-level JVM diagnostic features, including the ability to see the generated assembly code. Now, we can attach perf to the running JVM process. First, find the PID:
jps -l | grep StressTest
# Example output: 12345 StressTest
Then, run perf record:
sudo perf record -p 12345 -g --call-graph dwarf --output perf.data java
Here’s what each flag does:
-p 12345: Targets the specific Java process ID.-g: Enables call graph recording.--call-graph dwarf: Instructsperfto use DWARF debugging information for call graphs, which is essential for unwinding Java stack traces.--output perf.data: Specifies the output file for the profiling data.
After the application finishes, you can analyze the data:
sudo perf report -i perf.data
This perf report output will show you a breakdown of where the CPU time is being spent, including functions within the JVM itself and potentially compiled Java methods. You’ll see percentages, counts, and the call hierarchy.
The core problem perf helps solve for JVM applications is bridging the gap between high-level Java code and the low-level machine instructions the CPU actually executes. Without perf and these JVM flags, profiling tools often only see the Java method names, which can be misleading because the actual performance might be in the JVM’s internal C++ code, JIT compilation, garbage collection, or even the operating system’s interaction with the JVM. perf’s ability to record all events, including those happening within the JVM’s native code, gives you the complete picture.
The real power comes from understanding that perf is sampling hardware events. When you see a hotspot in perf report, it’s not just saying "this Java method took time"; it’s saying "the CPU was executing instructions that originated from this part of the code." This could be a Java method that has been JIT-compiled into highly optimized native code, or it could be a critical piece of the JVM’s internal machinery (like an allocation path in the garbage collector or a synchronization primitive). The -g and --call-graph dwarf flags are what allow perf to trace these native instructions back to their Java origins, or at least to the JVM’s internal C++ functions.
The most surprising thing is how much of your Java application’s performance is dictated by the JVM’s internal implementation details, not just your Java code’s algorithms. You might profile a Java method and see it’s not the bottleneck, only to discover that perf shows the CPU is saturated by System.gc() calls initiated by the JVM’s low-level GC threads, or by the JIT compiler itself working overtime. The -XX:+PrintAssembly flag, when used with tools like hsdis (which is often bundled with the JDK or available separately), can even show you the assembly code generated by the JIT compiler, allowing you to see exactly what the JVM is optimizing. This level of detail is unparalleled for understanding true performance.
The next step is to learn how to interpret the perf report output when it points to JVM internal functions, mapping those C++ symbols back to their functional role in garbage collection, JIT compilation, or thread management.