I'm trying to deploy the entire Spring Cloud Data Flow platform to a MicroK8s cluster running on one of our server, a VM with Ubuntu 20.04. Before starting performing actions on the target server, I tried to deploy it on my local computer (same OS) and I even succeeded and created/run one stream. Nevertheless, I am currently experiencing an error both on my local computer and on the VM, and I can't manage to pinpoint the root cause.
My current situation:
I'm following the official guide for deploying SCDF using kubectl, only difference being that I'm using tag v2.9.4
, latest at the time of writing, instead of v2.9.1
. I also skipped the configuration of monitoring frameworks, and hence commented the relevant lines in the configuration of SCDF server, as suggested in the docs. Kafka message broker and MySQL database are deployed without issues.
But, after executing kubectl commands to create config map, service and deployment for Skipper, I can see that Skipper pod goes in status "CrashLoopBackOff". Checking the logs of the pod, the only thing I see is that the application is terminated right after it seems to have started:
[...]
2022-04-11 15:00:11.713 INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 7577 (http) with context path ''
2022-04-11 15:00:11.907 INFO 1 --- [ main] o.s.c.s.s.app.SkipperServerApplication : Started SkipperServerApplication in 78.901 seconds (JVM running for 82.435)
2022-04-11 15:00:12.531 INFO 1 --- [ionShutdownHook] o.s.s.s.DefaultStateMachineService : Entering stop sequence, stopping all managed machines
2022-04-11 15:00:12.617 INFO 1 --- [ionShutdownHook] j.LocalContainerEntityManagerFactoryBean : Closing JPA EntityManagerFactory for persistence unit 'default'
2022-04-11 15:00:12.703 INFO 1 --- [ionShutdownHook] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown initiated...
2022-04-11 15:00:12.799 INFO 1 --- [ionShutdownHook] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown completed.
Native Memory Tracking:
Total: reserved=961864767, committed=325411903
- Java Heap (reserved=356515840, committed=138334208)
(mmap: reserved=356515840, committed=138334208)
- Class (reserved=269444100, committed=94409732)
(classes #17623)
( instance classes #16455, array classes #1168)
(malloc=3355652 #45645)
(mmap: reserved=266088448, committed=91054080)
( Metadata: )
( reserved=79691776, committed=78340096)
( used=76414680)
( free=1925416)
( waste=0 =0.00%)
( Class space:)
( reserved=186396672, committed=12713984)
( used=11544696)
( free=1169288)
( waste=0 =0.00%)
- Thread (reserved=14794856, committed=1323112)
(thread #14)
(stack: reserved=14729216, committed=1257472)
(malloc=51792 #86)
(arena=13848 #25)
- Code (reserved=255686068, committed=26629556)
(malloc=2053556 #8654)
(mmap: reserved=253632512, committed=24576000)
- GC (reserved=1728178, committed=1019570)
(malloc=560818 #2163)
(mmap: reserved=1167360, committed=458752)
- Compiler (reserved=35543622, committed=35543622)
(malloc=71174 #1162)
(arena=35472448 #19)
- Internal (reserved=432627, committed=432627)
(malloc=399859 #1104)
(mmap: reserved=32768, committed=32768)
- Other (reserved=10248, committed=10248)
(malloc=10248 #3)
- Symbol (reserved=22101496, committed=22101496)
(malloc=19867360 #240000)
(arena=2234136 #1)
- Native Memory Tracking (reserved=4899928, committed=4899928)
(malloc=9656 #122)
(tracking overhead=4890272)
- Arena Chunk (reserved=81808, committed=81808)
(malloc=81808)
- Tracing (reserved=1, committed=1)
(malloc=1 #1)
- Logging (reserved=4572, committed=4572)
(malloc=4572 #192)
- Arguments (reserved=19063, committed=19063)
(malloc=19063 #495)
- Module (reserved=310496, committed=310496)
(malloc=310496 #2710)
- Synchronizer (reserved=283672, committed=283672)
(malloc=283672 #2348)
- Safepoint (reserved=8192, committed=8192)
(mmap: reserved=8192, committed=8192)
No matter how many times the pod is restarted, it always exits at this phase. This is the output of kubectl get all
NAME READY STATUS RESTARTS AGE
pod/kafka-zk-6b6f4976cf-9hjzn 1/1 Running 0 69m
pod/kafka-broker-0 1/1 Running 0 58m
pod/mysql-7c57b4cfdf-njb97 1/1 Running 0 39m
pod/skipper-b46bfd5fd-wrnqv 0/1 CrashLoopBackOff 13 (57s ago) 38m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 148m
service/kafka-zk ClusterIP 10.152.183.62 <none> 2181/TCP,2888/TCP,3888/TCP 69m
service/kafka-broker ClusterIP None <none> 9092/TCP 69m
service/mysql ClusterIP 10.152.183.139 <none> 3306/TCP 40m
service/skipper LoadBalancer 10.152.183.250 <pending> 80:31955/TCP 38m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/kafka-zk 1/1 1 1 69m
deployment.apps/mysql 1/1 1 1 39m
deployment.apps/skipper 0/1 1 0 38m
NAME DESIRED CURRENT READY AGE
replicaset.apps/kafka-zk-6b6f4976cf 1 1 1 69m
replicaset.apps/mysql-7c57b4cfdf 1 1 1 39m
replicaset.apps/skipper-b46bfd5fd 1 1 0 38m
NAME READY AGE
statefulset.apps/kafka-broker 1/1 69m
What I tried:
- Changing the Skipper service type from LoadBalancer to NodePort (I have not enabled metallb so load balancing is not provided), but didn't work;
- Changing the port exposed by the container, in the default configuration is port 80, I changed it to 7577 (also in the service configuration), but the error still occurs;
- Downgraded to the version
2.8.2
of skipper, the same in the documentation above, the behaviour was exactly the same.
Increasing the logging level by setting logging.level.org.springframework
to DEBUG and then to TRACE didn't result in anything useful showing up in the logs, except a cryptic line which I did not found anywhere on google:
[...]
2022-04-11 15:22:38.818 DEBUG 1 --- [ main] o.s.c.c.CompositeCompatibilityVerifier : All conditions are passing
2022-04-11 15:22:39.098 DEBUG 1 --- [ main] ocalVariableTableParameterNameDiscoverer : Cannot find '.class' file for class [class org.springframework.statemachine.boot.autoconfigure.StateMachineAutoConfiguration$StateMachineMonitoringConfiguration$$EnhancerBySpringCGLIB$$b266f314] - unable to determine constructor/method parameter names
2022-04-11 15:22:39.925 INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 7577 (http) with context path ''
2022-04-11 15:22:40.244 INFO 1 --- [ main] o.s.c.s.s.app.SkipperServerApplication : Started SkipperServerApplication in 76.267 seconds (JVM running for 79.716)
[...]
Can anyone suggest me what to try next, or give me some way to further diagnosticate this issue?