I have a Java service that has been running for about ten days. I found that it was killed by Linux's OOM killer during busy periods. The heap memory I configured for it was Xmx10G Xms6G, but the OOM killer showed that its RSS was 24.5G.
[6252128.258453]Out of memory: Kill process 113627 (java) score 350 or sacrifice child
[6252128.258533]Killed process 113627 (java) total-vm:50134728kB, anon-rss:25062168kB, file-rss:0kB, shmem-rss:0kB
[7954659.707880][43116(bkmonitorbeat)]:gsch scan(inode=411811729,type=1,flags=0x0) -interrupted & wait(timeout=1000)
[7954659.709606][43116(bkmonitorbeat): gsch_scan(inode=411811729,type-1,flags-exe)-interrupted & waitdone
I restarted the process and found that the RSS in top was 30G after a few days. I suspect there is an off-heap memory leak.
I have performed operations such as jmap dump, jstack, and pmap, and found that there is a lot of memory of 65508KB held in the process. Then, I found many java.lang.ref.Finalizer objects in jmap, totaling 7206.
Further using OQL queries, I found that many of the referents in the Finalizer are org.apache.commons.dbcp2.DelegatingPreparedStatement (2124),
org.apache.commons.dbcp2.DelegatingStatement (1940),
sun.security.ssl.SSLSessionImpl (460).
The FinalizerThread is stuck at ReferenceQueue.remove(), and I feel that there is nothing left in this queue, but there is still a lot of garbage retained in the Finalizer's linked list.
"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f96183e2800 nid=0x174a7 in Object.wait() [0x00007f95e4ecd000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
- locked <0x0000000540020d40> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
I check smaps file, found that there are 322 memory blocks with an RSS value of around 64MB, which accumulates to more than 20GB of memory.
I use gdb dump an 64M memory to file, but I don't know the memory data struct,so I can't debug it.
This means that there is a possibility that the memory leak is related to glibc's malloc memory pool.
ldd /usr/lib64/jdk8/bin/java
linux-vdso.so.1 => (0x00007ffd381f1000)
libpthread.so.0 => /lib64/libpthread.so.0(0x00007fc9c1743000)
libjli.so => /usr/lib64/idk8/bin/../lib/amd64/ili/libjli.so(0x00007fc9c152d000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fc9c1329000)
libc.so.6 => /lib64/ibc.so.6 (0x00007fc9cof5b000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc9c195f000)
-------
Installed Packages
glibc.x86_64 2.17-317.el
and my process envionment don't have MALLOC_ARENA_MAX
I restarted the service manually to prevent OOM killer from killing it again and added the -XX:NativeMemoryTracking=detail parameter, but after running for a few days, I still cannot see any abnormalities when using jcmd pid VM.native_memory detail
.
I have some other debugging information What should I do next to troubleshoot this issue?