I'm experimenting with Coordinated Restore at Checkpoint (CRac) and an Spring Boot application (version 3.2.2) with Spring WebFlux containing a reactive web client (all Netty) using Azul 21.0.2.
When starting the app via java -Dspring.context.checkpoint=onRefresh -XX:CRaCCheckpointTo=./crac -jar my-app.jar
there are several open file descriptors from native code: timerfd, eventfd and eventpoll.
I'm suspecting Netty to hold this open files, but I'm using a bunch of other libs (mainly Spring Data Redis Reactive with Lettuce, Spring Kafka, Spring GraphQL) and I'm using Kotlin Coroutines.
Any advice? Is this a bug in Spring Webflux or Spring Data Redis Lettuce (all using Netty) or something completely different?
I'm expecting a checkpoint written to crac directory, but I get the follwing exception.
17:01:11.144 7742 ERROR main o.s.boot.SpringApplication - Application run failed
org.springframework.context.ApplicationContextException: Failed to take CRaC checkpoint on refresh
at org.springframework.context.support.DefaultLifecycleProcessor$CracDelegate.checkpointRestore(DefaultLifecycleProcessor.java:534) ~[spring-context-6.1.3.jar!/:6.1.3]
...
Caused by: org.crac.CheckpointException: null
at org.crac.Core$Compat.checkpointRestore(Core.java:144) ~[crac-1.4.0.jar!/:na]
at org.crac.Core.checkpointRestore(Core.java:237) ~[crac-1.4.0.jar!/:na]
at org.springframework.context.support.DefaultLifecycleProcessor$CracDelegate.checkpointRestore(DefaultLifecycleProcessor.java:528) ~[spring-context-6.1.3.jar!/:6.1.3]
... 15 common frames omitted
Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenResourceException: FD fd=6 type=unknown path=anon_inode:[eventpoll]
... 17 common frames omitted
Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenResourceException: FD fd=7 type=unknown path=anon_inode:[eventfd]
... 17 common frames omitted
Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenResourceException: FD fd=10 type=unknown path=anon_inode:[timerfd]
... 17 common frames omitted
...
Edit: I can now confirm that this problem is related to Netty as I completly removed the Netty code and the checkpoint works without it.
Edit: There seems to be a problem if using Webflux as Server and using WebClient at the same time.