A seemingly straightforward question: why does G1 GC spend so much time in the 'termination' phase? A non-full GC takes ~2 seconds, and from that, 1.5 sec is the termination (work stealing) phase.
Context: - our app has apprx. 10MB/sec allocation rate - most of those are short-lived objects - termination time is increasing linearly with app uptime - we have a large heap (24G) - full GC never occurs, if I force it manually then GC times are back to normal - the app itself does a lot of classloading (JSP) - we're using G1 GC with the following parameters:
-XX:+AggressiveOpts -XX:CICompilerCount=4 -XX:ConcGCThreads=2 -XX:G1HeapRegionSize=8388608 -XX:G1NewSizePercent=35 -XX:InitialHeapSize=17179869184 -XX:+ManagementServer -XX:MarkStackSize=4194304 -XX:MaxGCPauseMillis=1500 -XX:MaxHeapSize=23622320128 -XX:MaxNewSize=14168358912 -XX:MinHeapDeltaBytes=8388608 -XX:+ParallelRefProcEnabled -XX:+PrintAdaptiveSizePolicy -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintStringDeduplicationStatistics -XX:+PrintTenuringDistribution -XX:+UnlockExperimentalVMOptions -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseStringDeduplication
122618.508: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 645922816 bytes, new threshold 15 (max 15)
- age 1: 11770408 bytes, 11770408 total
- age 2: 4473200 bytes, 16243608 total
- age 3: 7342504 bytes, 23586112 total
- age 4: 2872344 bytes, 26458456 total
- age 5: 3291816 bytes, 29750272 total
- age 6: 3174480 bytes, 32924752 total
- age 7: 3538032 bytes, 36462784 total
- age 8: 3236984 bytes, 39699768 total
- age 9: 3369896 bytes, 43069664 total
- age 10: 4028056 bytes, 47097720 total
- age 11: 3598376 bytes, 50696096 total
- age 12: 3829808 bytes, 54525904 total
- age 13: 3966344 bytes, 58492248 total
- age 14: 3347280 bytes, 61839528 total
- age 15: 3887528 bytes, 65727056 total
122618.508: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 7510, predicted base time: 297.79 ms, remaining time: 1202.21 ms, target pause time: 1500.00 ms]
122618.508: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 1208 regions, survivors: 20 regions, predicted young region time: 157.92 ms]
122618.508: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1208 regions, survivors: 20 regions, old: 0 regions, predicted pause time: 455.71 ms, target pause time: 1500.00 ms]
, 2.1182594 secs]
[Parallel Time: 2094.7 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 122618508.2, Avg: 122618508.3, Max: 122618508.4, Diff: 0.3]
[Ext Root Scanning (ms): Min: 3.0, Avg: 283.6, Max: 2094.1, Diff: 2091.1, Sum: 2269.0]
[Update RS (ms): Min: 0.0, Avg: 2.2, Max: 4.3, Diff: 4.3, Sum: 17.2]
[Processed Buffers: Min: 0, Avg: 7.4, Max: 24, Diff: 24, Sum: 59]
[Scan RS (ms): Min: 0.1, Avg: 0.3, Max: 0.5, Diff: 0.4, Sum: 2.4]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.3]
[Object Copy (ms): Min: 0.0, Avg: 283.7, Max: 341.0, Diff: 340.9, Sum: 2269.5]
[Termination (ms): Min: 0.0, Avg: 1524.5, Max: 1754.8, Diff: 1754.8, Sum: 12196.2]
[Termination Attempts: Min: 1, Avg: 10989.2, Max: 18096, Diff: 18095, Sum: 87914]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.6]
[GC Worker Total (ms): Min: 2094.3, Avg: 2094.4, Max: 2094.6, Diff: 0.3, Sum: 16755.2]
[GC Worker End (ms): Min: 122620602.7, Avg: 122620602.7, Max: 122620602.8, Diff: 0.1]
[Code Root Fixup: 0.2 ms]
[Code Root Purge: 0.0 ms]
[String Dedup Fixup: 4.5 ms, GC Workers: 8]
[Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Table Fixup (ms): Min: 0.0, Avg: 3.7, Max: 4.3, Diff: 4.3, Sum: 29.7]
[Clear CT: 1.7 ms]
[Other: 17.1 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 14.6 ms]
[Ref Enq: 0.4 ms]
[Redirty Cards: 0.2 ms]
[Humongous Register: 0.1 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 1.5 ms]
[Eden: 9664.0M(9664.0M)->0.0B(9688.0M) Survivors: 160.0M->136.0M Heap: 10.5G(16.0G)->1101.1M(16.0G)]
[Times: user=14.38 sys=0.07, real=2.12 secs]
122620.627: [GC concurrent-string-deduplication, 142.3K->26.4K(115.9K), avg 60.3%, 0.0010556 secs]
[Last Exec: 0.0010556 secs, Idle: 450.5150579 secs, Blocked: 0/0.0000000 secs]
[Inspected: 2027]
[Skipped: 0( 0.0%)]
[Hashed: 1304( 64.3%)]
[Known: 602( 29.7%)]
[New: 1425( 70.3%) 142.3K]
[Deduplicated: 1240( 87.0%) 115.9K( 81.4%)]
[Young: 1240(100.0%) 115.9K(100.0%)]
[Old: 0( 0.0%) 0.0B( 0.0%)]
[Total Exec: 277/0.4346528 secs, Idle: 277/122619.8157354 secs, Blocked: 62/0.0449930 secs]
[Inspected: 801984]
[Skipped: 0( 0.0%)]
[Hashed: 411526( 51.3%)]
[Known: 245334( 30.6%)]
[New: 556650( 69.4%) 53.2M]
[Deduplicated: 403870( 72.6%) 32.1M( 60.3%)]
[Young: 392678( 97.2%) 31.1M( 96.8%)]
[Old: 11192( 2.8%) 1066.4K( 3.2%)]
[Table]
[Memory Usage: 8812.9K]
[Size: 262144, Min: 1024, Max: 16777216]
[Entries: 288637, Load: 110.1%, Cached: 0, Added: 358165, Removed: 69528]
[Resize Count: 8, Shrink Threshold: 174762(66.7%), Grow Threshold: 524288(200.0%)]
[Rehash Count: 0, Rehash Threshold: 120, Hash Seed: 0x0]
[Age Threshold: 3]
[Queue]
[Dropped: 0]
After looking at the logs again, the problem is not the termination time -- it's the
Ext Root Scanning
-- the termination time is long because those threads are waiting for the other workers to finish RS. This is because use the code cache is getting progressively larger.