Heap analysis is often treated as a dark art, but it’s really just about understanding how your application allocates and deallocates memory over time.

Let’s watch a simple Java application leak memory. We’ll use jcmd to get a heap dump and then jhat to analyze it.

# Compile and run the leaky app
javac LeakyApp.java
java LeakyApp

# In another terminal, find the PID of LeakyApp
jps -l

# Let's say the PID is 12345
# Generate a heap dump
jcmd 12345 GC.heap_dump /tmp/leaky_app.hprof

# Start the heap analysis server
jhat /tmp/leaky_app.hprof

# Open your browser to http://localhost:8080/

In the jhat output, you’ll see a list of all objects on the heap. The most interesting parts are the "OQL" (Object Query Language) console and the "Heap Size by Class" report.

OQL Query Example:

SELECT s.size FROM java.lang.String s WHERE s.value.length > 10000

This query looks for String objects with a value longer than 10,000 characters, which might indicate a data bloat issue.

Heap Size by Class Report:

This report shows which classes are consuming the most memory. If you see unexpected classes or a massive number of instances of a particular class, that’s your first clue.

The Problem This Solves:

Applications that consume excessive memory, or whose memory usage grows steadily without bound, eventually crash with an OutOfMemoryError. This can be due to a "memory leak" (objects are no longer needed but are still referenced, preventing garbage collection) or "memory bloat" (legitimate, but excessively large, data structures). Heap analysis lets you pinpoint exactly which objects are responsible.

How it Works Internally:

When you take a heap dump, the JVM freezes the application briefly and serializes the entire contents of the heap – all objects, their fields, and their references – into a .hprof file. Tools like jhat then parse this file. They build an in-memory representation of the object graph, allowing you to traverse references, calculate retained sizes (how much memory would be freed if an object were garbage collected), and identify reference chains that keep objects alive.

The Exact Levers You Control:

  1. Object Allocation: Understanding where in your code objects are being created is key. If a specific method or loop is creating many large objects, that’s a prime candidate.
  2. Object Lifetimes: How long do objects need to live? Are they being held in static collections, caches, or long-lived threads longer than necessary?
  3. Garbage Collection Behavior: While you can’t directly control GC, understanding how it works helps. If you see objects that should be eligible for collection still present, it points to an unintended reference.
  4. Data Structures: Are you using the most efficient data structures for your needs? A HashMap might be fine for a few thousand entries, but for millions, a more specialized or memory-tuned map might be required.

The most surprising thing about heap analysis is how often the culprit is not a complex algorithm but a simple, overlooked static collection or a cache that grew unchecked. For instance, a static List<MyObject> that’s never cleared, or a ConcurrentHashMap used as an unbounded cache, will inevitably lead to memory exhaustion if the application runs long enough. The reference chain might be trivial – ApplicationContext -> StaticFields -> MyClass -> List -> MyObject – but it’s enough to keep those MyObject instances alive.

Once you’ve identified and fixed the problematic object allocations or reference chains, the next step is often to monitor the application’s heap usage over time to ensure the fix is effective and no new leaks have been introduced.

Want structured learning?

Take the full Performance course →