Recently I added OTEL java agent to our application and right away noticed increased latencies, sometimes doubled from 20 ms to 40 ms. Health check went from < 1 ms to ~10ms. GC pauses increased from 15ms to 130ms.
Also, now I see BLOCKED threads in a thread dump with wait on io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.AbstractWeakConcurrentMap. with o only one RUNNING thread
"ForkJoinPool.common pool-worker-3" #269 daemon prio=5 os_prio=0 cpu=1519375.00ms elapsed=36357.28s tid=0x00007fd9a8007cb0 nid=0x262 runnable [0x00007fd88c5ca000]
java.lang.Thread.State: RUNNABLE
at java.lang.ref.ReferenceQueue.poll([email protected]/ReferenceQueue.java:119)
- parking to wait for <0x0000000304753f40> (a java.util.concurrent.ForkJoinPool)
- locked <0x0000000303002610> (a java.lang.ref.ReferenceQueue$Lock)
at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.AbstractWeakConcurrentMap.expungeStaleEntries(AbstractWeakConcurrentMap.java:235)
at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.WeakConcurrentMap$WithInlinedExpunction.getIfPresent(WeakConcurrentMap.java:193)
at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.WeakLockFreeCache.get(WeakLockFreeCache.java:26)
at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$util$concurrent$ForkJoinTask$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.mapGet(VirtualFieldImplementationsGenerator.java:297)
at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$util$concurrent$ForkJoinTask$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.realGet(VirtualFieldImplementationsGenerator.java)
at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$util$concurrent$ForkJoinTask$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.get(VirtualFieldImplementationsGenerator.java:279)
at io.opentelemetry.javaagent.bootstrap.executors.TaskAdviceHelper.makePropagatedContextCurrent(TaskAdviceHelper.java:23)
at java.util.concurrent.CountedCompleter.exec([email protected]/CountedCompleter.java:754)
at java.util.concurrent.ForkJoinTask.doExec([email protected]/ForkJoinTask.java:373)
at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec([email protected]/ForkJoinPool.java:1182)
at java.util.concurrent.ForkJoinPool.scan([email protected]/ForkJoinPool.java:1655)
at java.util.concurrent.ForkJoinPool.runWorker([email protected]/ForkJoinPool.java:1622)
at java.util.concurrent.ForkJoinWorkerThread.run([email protected]/ForkJoinWorkerThread.java:165)
Others are blocked like
"ForkJoinPool.common pool-worker-10" #276 daemon prio=5 os_prio=0 cpu=1528899.78ms elapsed=36357.28s tid=0x00007fd8f00032a0 nid=0x269 waiting for monitor entry [0x00007fd88afb7000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.ref.ReferenceQueue.poll([email protected]/ReferenceQueue.java:119)
- parking to wait for <0x0000000304753f40> (a java.util.concurrent.ForkJoinPool)
- waiting to lock <0x0000000303002610> (a java.lang.ref.ReferenceQueue$Lock)
at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.AbstractWeakConcurrentMap.expungeStaleEntries(AbstractWeakConcurrentMap.java:235)
at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.WeakConcurrentMap$WithInlinedExpunction.getIfPresent(WeakConcurrentMap.java:193)
at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.WeakLockFreeCache.get(WeakLockFreeCache.java:26)
at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$util$concurrent$ForkJoinTask$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.mapGet(VirtualFieldImplementationsGenerator.java:297)
at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$util$concurrent$ForkJoinTask$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.realGet(VirtualFieldImplementationsGenerator.java)
at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$util$concurrent$ForkJoinTask$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.get(VirtualFieldImplementationsGenerator.java:279)
at io.opentelemetry.javaagent.bootstrap.executors.ExecutorAdviceHelper.attachContextToTask(ExecutorAdviceHelper.java:51)
at java.util.concurrent.ForkJoinTask.fork([email protected]/ForkJoinTask.java:649)
at java.util.stream.AbstractShortCircuitTask.compute([email protected]/AbstractShortCircuitTask.java:133)
at java.util.concurrent.CountedCompleter.exec([email protected]/CountedCompleter.java:754)
at java.util.concurrent.ForkJoinTask.doExec([email protected]/ForkJoinTask.java:373)
at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec([email protected]/ForkJoinPool.java:1182)
at java.util.concurrent.ForkJoinPool.scan([email protected]/ForkJoinPool.java:1655)
at java.util.concurrent.ForkJoinPool.runWorker([email protected]/ForkJoinPool.java:1622)
at java.util.concurrent.ForkJoinWorkerThread.run([email protected]/ForkJoinWorkerThread.java:165)
Or from another thread dump
"ForkJoinPool.common pool-worker-5" #271 daemon prio=5 os_prio=0 cpu=7780467.78ms elapsed=219941.20s tid=0x00007f889c0019d0 nid=0x271 waiting for monitor entry [0x00007f882018c000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.ref.ReferenceQueue.poll([email protected]/ReferenceQueue.java:119)
- parking to wait for <0x0000000303003108> (a java.util.concurrent.ForkJoinPool)
- locked <0x00000003030017b0> (a java.lang.ref.ReferenceQueue$Lock)
at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.AbstractWeakConcurrentMap.expungeStaleEntries(AbstractWeakConcurrentMap.java:235)
at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.WeakConcurrentMap$WithInlinedExpunction.put(WeakConcurrentMap.java:205)
at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.WeakLockFreeCache.put(WeakLockFreeCache.java:31)
at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$util$concurrent$ForkJoinTask$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.mapPut(VirtualFieldImplementationsGenerator.java:304)
at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$util$concurrent$ForkJoinTask$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.real put(VirtualFieldImplementationsGenerator.java)
Has anyone have any idea what's going on? Is there a deadlock situation between FJP and OTEL agent?