OTEL Java Agent high GC pauses, added latencies and blocked threads

171 views Asked by At

Recently I added OTEL java agent to our application and right away noticed increased latencies, sometimes doubled from 20 ms to 40 ms. Health check went from < 1 ms to ~10ms. GC pauses increased from 15ms to 130ms. Also, now I see BLOCKED threads in a thread dump with wait on io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.AbstractWeakConcurrentMap. with o only one RUNNING thread

"ForkJoinPool.common pool-worker-3" #269 daemon prio=5 os_prio=0 cpu=1519375.00ms elapsed=36357.28s tid=0x00007fd9a8007cb0 nid=0x262 runnable  [0x00007fd88c5ca000]
   java.lang.Thread.State: RUNNABLE
        at java.lang.ref.ReferenceQueue.poll([email protected]/ReferenceQueue.java:119)
        - parking to wait for  <0x0000000304753f40> (a java.util.concurrent.ForkJoinPool)
        - locked <0x0000000303002610> (a java.lang.ref.ReferenceQueue$Lock)
        at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.AbstractWeakConcurrentMap.expungeStaleEntries(AbstractWeakConcurrentMap.java:235)
        at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.WeakConcurrentMap$WithInlinedExpunction.getIfPresent(WeakConcurrentMap.java:193)
        at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.WeakLockFreeCache.get(WeakLockFreeCache.java:26)
        at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$util$concurrent$ForkJoinTask$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.mapGet(VirtualFieldImplementationsGenerator.java:297)
        at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$util$concurrent$ForkJoinTask$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.realGet(VirtualFieldImplementationsGenerator.java)
        at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$util$concurrent$ForkJoinTask$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.get(VirtualFieldImplementationsGenerator.java:279)
        at io.opentelemetry.javaagent.bootstrap.executors.TaskAdviceHelper.makePropagatedContextCurrent(TaskAdviceHelper.java:23)
        at java.util.concurrent.CountedCompleter.exec([email protected]/CountedCompleter.java:754)
        at java.util.concurrent.ForkJoinTask.doExec([email protected]/ForkJoinTask.java:373)
        at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec([email protected]/ForkJoinPool.java:1182)
        at java.util.concurrent.ForkJoinPool.scan([email protected]/ForkJoinPool.java:1655)
        at java.util.concurrent.ForkJoinPool.runWorker([email protected]/ForkJoinPool.java:1622)
        at java.util.concurrent.ForkJoinWorkerThread.run([email protected]/ForkJoinWorkerThread.java:165)

Others are blocked like


"ForkJoinPool.common pool-worker-10" #276 daemon prio=5 os_prio=0 cpu=1528899.78ms elapsed=36357.28s tid=0x00007fd8f00032a0 nid=0x269 waiting for monitor entry  [0x00007fd88afb7000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.ref.ReferenceQueue.poll([email protected]/ReferenceQueue.java:119)
        - parking to wait for  <0x0000000304753f40> (a java.util.concurrent.ForkJoinPool)
        - waiting to lock <0x0000000303002610> (a java.lang.ref.ReferenceQueue$Lock)
        at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.AbstractWeakConcurrentMap.expungeStaleEntries(AbstractWeakConcurrentMap.java:235)
        at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.WeakConcurrentMap$WithInlinedExpunction.getIfPresent(WeakConcurrentMap.java:193)
        at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.WeakLockFreeCache.get(WeakLockFreeCache.java:26)
        at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$util$concurrent$ForkJoinTask$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.mapGet(VirtualFieldImplementationsGenerator.java:297)
        at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$util$concurrent$ForkJoinTask$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.realGet(VirtualFieldImplementationsGenerator.java)
        at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$util$concurrent$ForkJoinTask$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.get(VirtualFieldImplementationsGenerator.java:279)
        at io.opentelemetry.javaagent.bootstrap.executors.ExecutorAdviceHelper.attachContextToTask(ExecutorAdviceHelper.java:51)
        at java.util.concurrent.ForkJoinTask.fork([email protected]/ForkJoinTask.java:649)
        at java.util.stream.AbstractShortCircuitTask.compute([email protected]/AbstractShortCircuitTask.java:133)
        at java.util.concurrent.CountedCompleter.exec([email protected]/CountedCompleter.java:754)
        at java.util.concurrent.ForkJoinTask.doExec([email protected]/ForkJoinTask.java:373)
        at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec([email protected]/ForkJoinPool.java:1182)
        at java.util.concurrent.ForkJoinPool.scan([email protected]/ForkJoinPool.java:1655)
        at java.util.concurrent.ForkJoinPool.runWorker([email protected]/ForkJoinPool.java:1622)
        at java.util.concurrent.ForkJoinWorkerThread.run([email protected]/ForkJoinWorkerThread.java:165)

Or from another thread dump

"ForkJoinPool.common pool-worker-5" #271 daemon prio=5 os_prio=0 cpu=7780467.78ms elapsed=219941.20s tid=0x00007f889c0019d0 nid=0x271 waiting for monitor entry  [0x00007f882018c000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at java.lang.ref.ReferenceQueue.poll([email protected]/ReferenceQueue.java:119)
    - parking to wait for  <0x0000000303003108> (a java.util.concurrent.ForkJoinPool)
    - locked <0x00000003030017b0> (a java.lang.ref.ReferenceQueue$Lock)
    at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.AbstractWeakConcurrentMap.expungeStaleEntries(AbstractWeakConcurrentMap.java:235)
    at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.WeakConcurrentMap$WithInlinedExpunction.put(WeakConcurrentMap.java:205)
    at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.WeakLockFreeCache.put(WeakLockFreeCache.java:31)
    at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$util$concurrent$ForkJoinTask$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.mapPut(VirtualFieldImplementationsGenerator.java:304)
    at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$util$concurrent$ForkJoinTask$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.real put(VirtualFieldImplementationsGenerator.java)

Has anyone have any idea what's going on? Is there a deadlock situation between FJP and OTEL agent?

0

There are 0 answers