java process killed by linux oomkiller, like native memory leak

175 views Asked by At

I have a Java service that has been running for about ten days. I found that it was killed by Linux's OOM killer during busy periods. The heap memory I configured for it was Xmx10G Xms6G, but the OOM killer showed that its RSS was 24.5G.

[6252128.258453]Out of memory: Kill process 113627 (java) score 350 or sacrifice child
[6252128.258533]Killed process 113627 (java) total-vm:50134728kB, anon-rss:25062168kB, file-rss:0kB, shmem-rss:0kB
[7954659.707880][43116(bkmonitorbeat)]:gsch scan(inode=411811729,type=1,flags=0x0) -interrupted & wait(timeout=1000)
[7954659.709606][43116(bkmonitorbeat): gsch_scan(inode=411811729,type-1,flags-exe)-interrupted & waitdone

I restarted the process and found that the RSS in top was 30G after a few days. I suspect there is an off-heap memory leak.

I have performed operations such as jmap dump, jstack, and pmap, and found that there is a lot of memory of 65508KB held in the process. Then, I found many java.lang.ref.Finalizer objects in jmap, totaling 7206.

enter image description here

Further using OQL queries, I found that many of the referents in the Finalizer are org.apache.commons.dbcp2.DelegatingPreparedStatement (2124),

enter image description here

org.apache.commons.dbcp2.DelegatingStatement (1940),

enter image description here

sun.security.ssl.SSLSessionImpl (460).

enter image description here

The FinalizerThread is stuck at ReferenceQueue.remove(), and I feel that there is nothing left in this queue, but there is still a lot of garbage retained in the Finalizer's linked list.


"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f96183e2800 nid=0x174a7 in Object.wait() [0x00007f95e4ecd000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
    - locked <0x0000000540020d40> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

I check smaps file, found that there are 322 memory blocks with an RSS value of around 64MB, which accumulates to more than 20GB of memory.

enter image description here

I use gdb dump an 64M memory to file, but I don't know the memory data struct,so I can't debug it.

This means that there is a possibility that the memory leak is related to glibc's malloc memory pool.

enter image description here

ldd /usr/lib64/jdk8/bin/java  

linux-vdso.so.1 =>  (0x00007ffd381f1000)
libpthread.so.0 => /lib64/libpthread.so.0(0x00007fc9c1743000)
libjli.so => /usr/lib64/idk8/bin/../lib/amd64/ili/libjli.so(0x00007fc9c152d000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fc9c1329000)
libc.so.6 => /lib64/ibc.so.6 (0x00007fc9cof5b000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc9c195f000)



-------
Installed Packages
glibc.x86_64  2.17-317.el

and my process envionment don't have MALLOC_ARENA_MAX

I restarted the service manually to prevent OOM killer from killing it again and added the -XX:NativeMemoryTracking=detail parameter, but after running for a few days, I still cannot see any abnormalities when using jcmd pid VM.native_memory detail.

enter image description here

I have some other debugging information What should I do next to troubleshoot this issue?

0

There are 0 answers