When we switched over to Java 21 from Java 17, we experienced a change in how to code cache is behaving.
We have small services running the serial GC collector (selected by the JVM), typically a heap of around 400 MB and a non-segmented code cache of around 40 MB, that is low I guess in today's standard.
Here are the configured JVM options that we typically use.
java
-Xmx400m
-Xss256k
-XX:MaxMetaspaceSize=120m
-XX:MaxDirectMemorySize=64m
-XX:CompressedClassSpaceSize=10m
-Xlog:gc*=info,codecache*=info // for debugging
-XX:ReservedCodeCacheSize=40m
-XX:MinHeapFreeRatio=50
-XX:MaxHeapFreeRatio=50
-XX:+HeapDumpOnOutOfMemoryError
-XX:-ShrinkHeapInSteps
-XX:MaxGCPauseMillis=10
With Java 17, the code cache was typically garbage collected quite often. Now, if a service does not have any need to perform a full GC (I see that the heap occupancy of tenured/old is very low), the code cache is never flushed. The code cache flushing seems to be a part of the full GC cycle.
What is really weird is that even if the code cache gets filled up, it does not perform any aggressive sweeping, even if I set the StartAggressiveSweepingAt to a high number. I tried to add the following to trigger code cache GC on allocation, but without any luck.
-XX:SweeperThresholdStartAggressiveSweepingAt=20
-XX:SweeperThreshold=0.5
-XX:NmethodSweepActivity=500
The thing is that when the service starts up, I do get some code cache sweeps so the "feature" is enabled. It will do some sweeps, but this is the last occurrence of a GC log that contains the phrase code cache. Observe "Triggering threshold (2.125%) GC due to allocating 2.135% since" in the log.
[1891.824s][info][gc ] GC(125) Pause Young (Allocation Failure) 125M->69M(202M) 3.616ms
[1891.824s][info][gc,cpu ] GC(125) User=0.00s Sys=0.00s Real=0.00s
[1891.824s][info][gc,heap ] GC(125) Tenured: 71312K(142628K)->71312K(142628K)
[1938.484s][info][codecache ] Triggering threshold (2.125%) GC due to allocating 2.135% since last unloading (44.729% used -> 46.864% used)
[1938.485s][info][gc,start ] GC(126) Pause Young (GCLocker Initiated GC)
[1938.489s][info][gc,cpu ] GC(126) User=0.00s Sys=0.00s Real=0.01s
[1938.488s][info][gc,heap ] GC(126) DefNew: 44957K(64320K)->923K(64320K) Eden: 44605K(57216K)->0K(57216K) From: 351K(7104K)->923K(7104K)
[1938.489s][info][gc ] GC(126) Pause Young (GCLocker Initiated GC) 113M->70M(202M) 3.711ms
[1938.489s][info][gc,heap ] GC(126) Tenured: 71312K(142628K)->71312K(142628K)
[1938.489s][info][gc,metaspace ] GC(126) Metaspace: 92917K(93504K)->92917K(93504K) NonClass: 81847K(82176K)->81847K(82176K) Class: 11069K(11328K)->11069K(11328K)
[1994.918s][info][gc,start ] GC(127) Pause Young (Allocation Failure)
To me, it sounds that it should trigger a GC (Full) at this point, but it does not happen. No full GC for hours. I looked in the source code of the JDK and would expect full GC to be scheduled. https://github.com/openjdk/jdk/blob/369bbecc0dab389b523c09bc332fe1cf6394cb26/src/hotspot/share/code/codeCache.cpp#L799-L804
I also found this one https://bugs.openjdk.org/browse/JDK-8290025 that I think is the reason for the changes in behaviour.
So my question is, why is no full GC triggered when the code cache is filled? Any parameters I should try? It's only cleared on full GC, invoked by other reasons.