AWS EC2 java program memorry map issue

53 views Asked by At

I have a java program where a file around 800MB is memory mapped via java.io.RandomAccessFile. I'm hosting it in an EC2 m5.8xlarge (32 CPUs, 128GB RAM) instance with JVM OPTS set to -Xms64g -Xmx64g. While starting the service, I met error:

 [thread 3606 also had an error]
 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGBUS (0x7) at pc=0x00007f214f3c5e73, pid=3556, tid=3637
 #
 # JRE version: OpenJDK Runtime Environment Temurin-17.0.6+10 (17.0.6+10) (build 17.0.6+10)
 # Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.6+10 (17.0.6+10, mixed mode, sharing, tiered, compressed class ptrs, z gc, linux-amd64)
 # Problematic frame:
 # V  [libjvm.so+0x602e73]  Copy::fill_to_memory_atomic(void*, unsigned long, unsigned char)+0x103
 #
 # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
 #
 # An error report file with more information is saved as:
 # /home/user/builds/current/hs_err_pid3556.log
 #
 # If you would like to submit a bug report, please visit:
 #   https://github.com/adoptium/adoptium-support/issues
 #
 /home/user/builds/current/start.sh: line 78:  3556 Aborted                 java ${JVM_OPTS} -cp 'lib/*' ${LAUNCH_CLASS} $@

And the hs_err_pid3556.log mentioned above gives me below, where sun.misc.Unsafe.setMemory went wrong when setting the block of memories to 0s:

Current thread (0x00007fcaf0274fd0):  JavaThread "ForkJoinPool-1-worker-7" daemon [_thread_in_vm, id=3199, stack(0x00007fcb15fda000,0x00007fcb160db000)]
Stack: [0x00007fcb15fda000,0x00007fcb160db000],  sp=0x00007fcb160d8ce8,  free space=1019k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x602e73]  Copy::fill_to_memory_atomic(void*, unsigned long, unsigned char)+0x103
j  jdk.internal.misc.Unsafe.setMemory0(Ljava/lang/Object;JJB)V+0 [email protected]
j  jdk.internal.misc.Unsafe.setMemory(Ljava/lang/Object;JJB)V+25 [email protected]
j  jdk.internal.misc.Unsafe.setMemory(JJB)V+6 [email protected]
j  sun.misc.Unsafe.setMemory(JJB)V+7 [email protected]
j  example.com.buffer.MemoryMappedBuffer.set(JJB)V+58
j  example.com.buffer.Buffer.zeroed()Lexample/com/buffer/Buffer;+9
j  example.com.collections.BufferSupplierMapped.supplyBuffers(JJ)Lorg/apache/commons/lang3/tuple/Pair;+37
j  example.com.collections.ConcurrentOffheapLongObjMap$MapImpl.<init>(Ljava/lang/String;Lexample/com/collections/OffheapMapBufferSupplier;Lexample/com/collections/OffheapValueSerDe;JJF)V+65
j  example.com.collections.ConcurrentOffheapLongObjMap.<init>(Ljava/lang/String;JJLexample/com/collections/OffheapMapBufferSupplier;Lexample/com/collections/OffheapValueSerDe;F)V+64
j  example.com.collections.ConcurrentOffheapLongObjMap.<init>(Ljava/lang/String;JJLexample/com/collections/OffheapMapBufferSupplier;Lexample/com/collections/OffheapValueSerDe;)V+11
j  example.com.collections.OffheapMapUtil.readToMapped(Ljava/lang/String;Lexample/com/collections/OffheapValueSerDe;Ljava/lang/String;Ljava/lang/String;)Lexample/com/collections/ConcurrentOffheapLongObjMap;+99
j  example.com.index.job.WritableSiteIndex.lambda$snapshotLoad$21(Lorg/apache/commons/lang3/mutable/MutableObject;Lexample/com/model/Site;Ljava/lang/String;Ljava/lang/String;)V+20
j  example.com.index.job.WritableSiteIndex$$Lambda$362+0x0000000801044f58.run()V+16
j  example.com.thread.AsyncTaskList.lambda$add$0(Ljava/lang/String;Lexample/com/function/ThrowingRunnable;)Ljava/lang/Void;+19
j  example.com.thread.AsyncTaskList$$Lambda$223+0x0000000800e31428.call()Ljava/lang/Object;+8
j  example.com.thread.AsyncTaskList.lambda$execute$1(Ljava/util/concurrent/Callable;)Ljava/lang/Boolean;+1
j  example.com.thread.AsyncTaskList$$Lambda$230+0x0000000800e30400.get()Ljava/lang/Object;+4
j  example.com.thread.WorkerService$$Lambda$231+0x0000000800e38000.call()Ljava/lang/Object;+4
j  java.util.concurrent.ForkJoinTask$AdaptedCallable.exec()Z+5 [email protected]
j  java.util.concurrent.ForkJoinTask.doExec()I+10 [email protected]
j  java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Ljava/util/concurrent/ForkJoinTask;Ljava/util/concurrent/ForkJoinPool$WorkQueue;)V+13 [email protected]
j  java.util.concurrent.ForkJoinPool.scan(Ljava/util/concurrent/ForkJoinPool$WorkQueue;II)I+193 [email protected]
j  java.util.concurrent.ForkJoinPool.runWorker(Ljava/util/concurrent/ForkJoinPool$WorkQueue;)V+53 [email protected]
j  java.util.concurrent.ForkJoinWorkerThread.run()V+31 [email protected]
v  ~StubRoutines::call_stub
V  [libjvm.so+0x822715]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x315
V  [libjvm.so+0x823f0b]  JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, JavaThread*)+0x1cb
V  [libjvm.so+0x8eda53]  thread_entry(JavaThread*, JavaThread*)+0xa3
V  [libjvm.so+0xe5e974]  JavaThread::thread_main_inner()+0x184
V  [libjvm.so+0xe62020]  Thread::call_run()+0xc0
V  [libjvm.so+0xc187e1]  thread_native_entry(Thread*)+0xe1
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  jdk.internal.misc.Unsafe.setMemory0(Ljava/lang/Object;JJB)V+0 [email protected]
j  jdk.internal.misc.Unsafe.setMemory(Ljava/lang/Object;JJB)V+25 [email protected]
j  jdk.internal.misc.Unsafe.setMemory(JJB)V+6 [email protected]
j  sun.misc.Unsafe.setMemory(JJB)V+7 [email protected]
j  example.com.buffer.MemoryMappedBuffer.set(JJB)V+58
j  example.com.buffer.Buffer.zeroed()Lexample/com/buffer/Buffer;+9
j  example.com.collections.BufferSupplierMapped.supplyBuffers(JJ)Lorg/apache/commons/lang3/tuple/Pair;+37
j  example.com.collections.ConcurrentOffheapLongObjMap$MapImpl.<init>(Ljava/lang/String;Lexample/com/collections/OffheapMapBufferSupplier;Lexample/com/collections/OffheapValueSerDe;JJF)V+65
j  example.com.collections.ConcurrentOffheapLongObjMap.<init>(Ljava/lang/String;JJLexample/com/collections/OffheapMapBufferSupplier;Lexample/com/collections/OffheapValueSerDe;F)V+64
j  example.com.collections.ConcurrentOffheapLongObjMap.<init>(Ljava/lang/String;JJLexample/com/collections/OffheapMapBufferSupplier;Lexample/com/collections/OffheapValueSerDe;)V+11
j  example.com.collections.OffheapMapUtil.readToMapped(Ljava/lang/String;Lexample/com/collections/OffheapValueSerDe;Ljava/lang/String;Ljava/lang/String;)Lexample/com/collections/ConcurrentOffheapLongObjMap;+99
j  example.com.index.job.WritableSiteIndex.lambda$snapshotLoad$21(Lorg/apache/commons/lang3/mutable/MutableObject;Lexample/com/model/Site;Ljava/lang/String;Ljava/lang/String;)V+20
j  example.com.index.job.WritableSiteIndex$$Lambda$362+0x0000000801044f58.run()V+16
j  example.com.thread.AsyncTaskList.lambda$add$0(Ljava/lang/String;Lexample/com/function/ThrowingRunnable;)Ljava/lang/Void;+19
j  example.com.thread.AsyncTaskList$$Lambda$223+0x0000000800e31428.call()Ljava/lang/Object;+8
j  example.com.thread.AsyncTaskList.lambda$execute$1(Ljava/util/concurrent/Callable;)Ljava/lang/Boolean;+1
j  example.com.thread.AsyncTaskList$$Lambda$230+0x0000000800e30400.get()Ljava/lang/Object;+4
j  example.com.thread.WorkerService$$Lambda$231+0x0000000800e38000.call()Ljava/lang/Object;+4
j  java.util.concurrent.ForkJoinTask$AdaptedCallable.exec()Z+5 [email protected]
j  java.util.concurrent.ForkJoinTask.doExec()I+10 [email protected]
j  java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Ljava/util/concurrent/ForkJoinTask;Ljava/util/concurrent/ForkJoinPool$WorkQueue;)V+13 [email protected]
j  java.util.concurrent.ForkJoinPool.scan(Ljava/util/concurrent/ForkJoinPool$WorkQueue;II)I+193 [email protected]
j  java.util.concurrent.ForkJoinPool.runWorker(Ljava/util/concurrent/ForkJoinPool$WorkQueue;)V+53 [email protected]
j  java.util.concurrent.ForkJoinWorkerThread.run()V+31 [email protected]
v  ~StubRoutines::call_stub
siginfo: si_signo: 7 (SIGBUS), si_code: 2 (BUS_ADRERR), si_addr: 0x00007fc979848000

What's interesting is that there's no problem running the same program mem-mapping the same file in my laptop (Ubuntu with 64g ram). That that AWS has no problem loading a very similar but smaller (560MB vs 800MB) file. So I'm pretty sure the Java program is working as expected, and so is the integrity of the file to be mapped.

1

There are 1 answers

0
Mukulit Bhati On

The SIGBUS (Bus Error) you're encountering often indicates memory-related issues, particularly with memory alignment. In the given log, it seems to be happening in the Copy::fill_to_memory_atomic function, and the problematic frame is in Unsafe.setMemory0. This suggests that there might be an issue with the memory mapping or alignment.

Try to experiment with different JVM options. For example, you might try using the -XX:MaxDirectMemorySize option to limit the amount of direct buffer memory and check the ulimit settings on your EC2 instance also review how you are specifying the memory map size.

If the issue persists, consider reaching out to AWS support for assistance. They might be able to provide insights specific to your AWS environment.