Diagnosing JVM Problems: OOM, GC Logs, Stack Traces, and Thread Dumps

JVM diagnostics are useful when each symptom is mapped back to the runtime structure that produced it: heap, stack, class metadata, threads, GC, JIT, locks, or native memory.

Learning Question

How should I read JVM failure signals without treating them as random errors?

JVM diagnostics become confusing when every problem is collapsed into “memory issue,” “thread issue,” or “JVM issue.” A useful diagnosis starts by locating the failing runtime area and collecting the artifact that can actually explain it.

The first mental model is:

A JVM symptom is a clue about a specific runtime boundary.

OutOfMemoryError

OutOfMemoryError is not one error with one cause.

Different messages point to different runtime areas:

Message	Likely Area
`Java heap space`	Ordinary object allocation on the heap
`GC overhead limit exceeded`	Heap pressure and excessive GC with little recovery
`Metaspace`	Class metadata memory
`Direct buffer memory`	Direct buffers outside the Java heap
`Unable to create new native thread`	Native thread or operating system resource exhaustion

The next diagnostic step depends on the area.

For heap pressure, a heap dump and allocation or retention analysis may help. For metaspace, class loading and class loader retention matter. For native thread failures, thread count, stack size, process limits, and operating system limits matter.

Increasing -Xmx is not a universal fix. It only changes maximum Java heap size.

GC Logs

GC logs describe garbage collection activity.

They can answer questions such as:

How often is GC running?
How long are pauses?
How much heap is used before and after collection?
Is the old generation or long-lived region growing?
Are concurrent cycles keeping up?
Is allocation pressure too high?

GC logs do not directly tell you which line of code retains an object. They show collector behavior and memory pressure over time.

If the heap remains high after full or major collection, the application may have a large live set or retained objects. If pauses are frequent but memory is recovered each time, allocation rate may be the main pressure.

Stack Traces

A stack trace shows how a thread reached a point in execution.

It is useful for:

locating the path to an exception
identifying which code was active when an error was thrown
seeing repeated recursive calls
separating application frames from framework and library frames

A stack trace is not a heap dump, a thread scheduling history, or proof of the original root cause. It is a call-path snapshot.

For exceptions, read from the causal chain outward. The top stack frame shows where that throwable was created or thrown, but framework wrappers may add layers around the original cause.

Thread Dumps

A thread dump shows many threads at one point in time.

It is useful for:

finding deadlocks
identifying threads blocked on the same lock
seeing thread pool exhaustion
observing many threads waiting on I/O or queues
locating long-running CPU-bound call paths

Thread states must be read carefully.

BLOCKED often means waiting to enter a monitor. WAITING or TIMED_WAITING can be normal for idle worker threads. RUNNABLE does not always mean using CPU heavily. It can also include threads in native or I/O-related execution depending on the runtime and operating system state.

One thread dump is a snapshot. Multiple dumps over time are often more useful because they show whether the same threads remain stuck in the same places.

Class Loading and Linkage Errors

Class loading and linkage problems should be diagnosed through the classpath, module path, class loader hierarchy, dependency versions, and initialization path.

Common signals include:

ClassNotFoundException
NoClassDefFoundError
NoSuchMethodError
NoSuchFieldError
ClassCastException involving the same class name from different loaders
ExceptionInInitializerError

These are not memory errors by default. They often indicate runtime artifact mismatch, missing dependencies, class loader isolation, or static initialization failure.

A Practical Diagnosis Sequence

A useful sequence is:

Identify the exact symptom and message.
Map it to a runtime area.
Collect the artifact that explains that area.
Separate application behavior from JVM runtime behavior.
Check whether the failure is startup-only, warmup-related, steady-state, load-dependent, or environment-specific.
Change one hypothesis at a time.

Examples:

Heap OOM: collect heap dump, inspect retained objects, compare heap after GC.
Long pauses: inspect GC logs, safepoint data, JFR events, and system resource pressure.
Stuck requests: collect thread dumps over time and inspect locks, waits, and pool usage.
Slow warmup: inspect class loading, JIT activity, caches, and first-use initialization.

Core Mental Model

Keep these boundaries separate:

OOM messages point to different memory areas.
GC logs explain collector behavior, not object ownership by themselves.
Stack traces show call paths.
Thread dumps show thread snapshots.
Linkage errors point to runtime class compatibility and loading boundaries.
Good diagnosis starts by mapping the symptom to the JVM structure that can produce it.

Final Summary

JVM diagnostics are not random text emitted by a black box.

They are runtime artifacts. Read each one by asking which JVM structure produced it and what additional evidence can confirm or reject the suspected cause.

Insight Vault

Browse