Java native memory leak detection
When thinking about memory leak in Java we usually think about Java heap leak where objects are allocated in the heap are not being garbage collected. That was what I was thinking when working on a memory leak in one of our servers, but what I was about to experience was far beyond my imagination.
##The Symptoms: Production servers running Vertx application (with no swap partition) were crashing by the Linux out of memory killer (OS mechanism that frees memory when the system gets to a stressful low memory situation).
Since it was production server, I though O.K. lets take heap dump and use MAT to check what was going on and try to find who is eating so much memory.
The results were surprising, the java heap was reasonable, much less than the process memory foot print. Something was eating memory and I had no idea what it was.
My starting point was java fatal error logs that were created in the home directory, so I started exploring those logs. You can read more about java fatal error log in the following page: Fatal Error Log Fatal error log can supply a lot of valuable information and save time so I suggest to read them carefully. The fatal error log revealed that the heap size was less than 2G but the process was growing to about 3-4Gbytes, what was going on?
Java process contains the following memory spaces:
- The heap - where objects are allocated.
- Thread stacks - contains all the thread stacks.
- Metaspace - contains classes metadata (replacing PermGen in java 7 and earlier).
- Code cache - The JIT compiler code cache.
- Buffer pools - Out of heap buffer pools.
- OS memory - Native OS memory. For more read see the following page: Java Memory
I used the command:
jmap -heap [pid]
which print a summary of the JVM heap sizes, and it also showed that the heap size was about 1.5GByte.
I checked the Metaspace size but it was just a few mega bytes. Maybe the code cache is the problem? I checked again the fatal error log I saw that was only about 20M. I came to the conclusion that maybe I have native memory leak and I started checking this option. I used java NMT feature, by starting the application with native memory argument: -XX:NativeMemoryTracking=detail Then I used jcmd utility (included in the JDK) to check native memory. Read more about NMT in the following page: NMT The result contained the following:
Internal (reserved=1031767KB, committed=1031767KB) (malloc=1031735KB #7619) (mmap: reserved=32KB, committed=32KB)
I found the JVM had Internal memory with size of about 1Gbyte. Now I was sure that I had native memory leak. This was a little intimadating, I had to leave my home land Java and explore faraway and mysterious land, a land where developers have to actually handle releasing memory in the absent of the diligent and devoted garbage collector.
I found that there were huge malloc (os memory allocation call) memory allocation but still I had no clue what was causing that. I tried to process memory dump using gdb but it didn’t go well and I came to the conclusion that it will probably won’t be useful so I searched for Linux memory leak detection tools. I had a few candidates:
The first two didn’t work for some reasons so tried jemalloc which people recommended. I added the following lines to the application startup script:
MALLOC_CONF=prof_leak:true,prof_final:true,lg_prof_interval:30,lg_prof_sample:17 \ LD_PRELOAD=/usr/local/lib/libjemalloc.so.2 [jre path]/jre/bin/java [app parameters]
At first jemalloc didn’t work and I had to recompiled it with configure parameter —enable-prof and then it started working. After I closed the application jemalloc created reproof file in the working directory.
[jemalloc install dir]/bin/jeprof --show_bytes --pdf ‘[jre path]/jre/bin/java' [jeprof file] > [pdf output file name]
jemalloc analysis indicated that there is a major leak from “Unsafe_AllocateMemory” that allocates allot of memory using malloc os calls. I thought I will have the answer by now but apparently I was wrong. I did some Googling and found that Unsafe_AllocateMemory is probably related to class named sun.misc.Unsafe. Unsafe is JDK private class that perform native memory allocation, and as it name indicates it is not safe (Oracle planned to remove this class in java 9 but eventually it remains in the unsupported module). I searched the application code and didn’t find any use of it, I assumed it is probably used is some of the libraries of the application. The primary suspect was Netty, Netty is network library used by Vertx. Netty supple great performance but is using native memory allocation to achieve that. Digging in Netty source code revealed that it is using the sun.misc.Unsafe to allocate native memory pools. Netty includes memory leak detection mechanism so I gave it a try by useing Netty argument:
-Dio.netty.leakDetection.level=advanced But that dind't supply any output.
I tried limiting netty by using other Netty arguments (which are not well documented):
-Dio.netty.noPreferDirect=true -Dio.netty.allocator.type=unpooled -Dio.netty.maxDirectMemory=0
But that didn’t work either.
Eventually after many tries I found that direct memory alloctions can be limited by following jvm argument:
##Summary After digging deep into the the jvm I finally found the solution. I learned alot about the jvm mechanisms and memory spaces.