Spring Cloud Skipper errors out immediately after start on local MicroK8s

277 views Asked by At

I'm trying to deploy the entire Spring Cloud Data Flow platform to a MicroK8s cluster running on one of our server, a VM with Ubuntu 20.04. Before starting performing actions on the target server, I tried to deploy it on my local computer (same OS) and I even succeeded and created/run one stream. Nevertheless, I am currently experiencing an error both on my local computer and on the VM, and I can't manage to pinpoint the root cause.

My current situation:

I'm following the official guide for deploying SCDF using kubectl, only difference being that I'm using tag v2.9.4, latest at the time of writing, instead of v2.9.1. I also skipped the configuration of monitoring frameworks, and hence commented the relevant lines in the configuration of SCDF server, as suggested in the docs. Kafka message broker and MySQL database are deployed without issues.

But, after executing kubectl commands to create config map, service and deployment for Skipper, I can see that Skipper pod goes in status "CrashLoopBackOff". Checking the logs of the pod, the only thing I see is that the application is terminated right after it seems to have started:

[...]
2022-04-11 15:00:11.713  INFO 1 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 7577 (http) with context path ''
2022-04-11 15:00:11.907  INFO 1 --- [           main] o.s.c.s.s.app.SkipperServerApplication   : Started SkipperServerApplication in 78.901 seconds (JVM running for 82.435)
2022-04-11 15:00:12.531  INFO 1 --- [ionShutdownHook] o.s.s.s.DefaultStateMachineService       : Entering stop sequence, stopping all managed machines
2022-04-11 15:00:12.617  INFO 1 --- [ionShutdownHook] j.LocalContainerEntityManagerFactoryBean : Closing JPA EntityManagerFactory for persistence unit 'default'
2022-04-11 15:00:12.703  INFO 1 --- [ionShutdownHook] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Shutdown initiated...
2022-04-11 15:00:12.799  INFO 1 --- [ionShutdownHook] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Shutdown completed.

Native Memory Tracking:

Total: reserved=961864767, committed=325411903
-                 Java Heap (reserved=356515840, committed=138334208)
                            (mmap: reserved=356515840, committed=138334208) 
 
-                     Class (reserved=269444100, committed=94409732)
                            (classes #17623)
                            (  instance classes #16455, array classes #1168)
                            (malloc=3355652 #45645) 
                            (mmap: reserved=266088448, committed=91054080) 
                            (  Metadata:   )
                            (    reserved=79691776, committed=78340096)
                            (    used=76414680)
                            (    free=1925416)
                            (    waste=0 =0.00%)
                            (  Class space:)
                            (    reserved=186396672, committed=12713984)
                            (    used=11544696)
                            (    free=1169288)
                            (    waste=0 =0.00%)
 
-                    Thread (reserved=14794856, committed=1323112)
                            (thread #14)
                            (stack: reserved=14729216, committed=1257472)
                            (malloc=51792 #86) 
                            (arena=13848 #25)
 
-                      Code (reserved=255686068, committed=26629556)
                            (malloc=2053556 #8654) 
                            (mmap: reserved=253632512, committed=24576000) 
 
-                        GC (reserved=1728178, committed=1019570)
                            (malloc=560818 #2163) 
                            (mmap: reserved=1167360, committed=458752) 
 
-                  Compiler (reserved=35543622, committed=35543622)
                            (malloc=71174 #1162) 
                            (arena=35472448 #19)
 
-                  Internal (reserved=432627, committed=432627)
                            (malloc=399859 #1104) 
                            (mmap: reserved=32768, committed=32768) 
 
-                     Other (reserved=10248, committed=10248)
                            (malloc=10248 #3) 
 
-                    Symbol (reserved=22101496, committed=22101496)
                            (malloc=19867360 #240000) 
                            (arena=2234136 #1)
 
-    Native Memory Tracking (reserved=4899928, committed=4899928)
                            (malloc=9656 #122) 
                            (tracking overhead=4890272)
 
-               Arena Chunk (reserved=81808, committed=81808)
                            (malloc=81808) 
 
-                   Tracing (reserved=1, committed=1)
                            (malloc=1 #1) 
 
-                   Logging (reserved=4572, committed=4572)
                            (malloc=4572 #192) 
 
-                 Arguments (reserved=19063, committed=19063)
                            (malloc=19063 #495) 
 
-                    Module (reserved=310496, committed=310496)
                            (malloc=310496 #2710) 
 
-              Synchronizer (reserved=283672, committed=283672)
                            (malloc=283672 #2348) 
 
-                 Safepoint (reserved=8192, committed=8192)
                            (mmap: reserved=8192, committed=8192)

No matter how many times the pod is restarted, it always exits at this phase. This is the output of kubectl get all

NAME                            READY   STATUS             RESTARTS       AGE
pod/kafka-zk-6b6f4976cf-9hjzn   1/1     Running            0              69m
pod/kafka-broker-0              1/1     Running            0              58m
pod/mysql-7c57b4cfdf-njb97      1/1     Running            0              39m
pod/skipper-b46bfd5fd-wrnqv     0/1     CrashLoopBackOff   13 (57s ago)   38m

NAME                   TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/kubernetes     ClusterIP      10.152.183.1     <none>        443/TCP                      148m
service/kafka-zk       ClusterIP      10.152.183.62    <none>        2181/TCP,2888/TCP,3888/TCP   69m
service/kafka-broker   ClusterIP      None             <none>        9092/TCP                     69m
service/mysql          ClusterIP      10.152.183.139   <none>        3306/TCP                     40m
service/skipper        LoadBalancer   10.152.183.250   <pending>     80:31955/TCP                 38m

NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/kafka-zk   1/1     1            1           69m
deployment.apps/mysql      1/1     1            1           39m
deployment.apps/skipper    0/1     1            0           38m

NAME                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/kafka-zk-6b6f4976cf   1         1         1       69m
replicaset.apps/mysql-7c57b4cfdf      1         1         1       39m
replicaset.apps/skipper-b46bfd5fd     1         1         0       38m

NAME                            READY   AGE
statefulset.apps/kafka-broker   1/1     69m

What I tried:

  • Changing the Skipper service type from LoadBalancer to NodePort (I have not enabled metallb so load balancing is not provided), but didn't work;
  • Changing the port exposed by the container, in the default configuration is port 80, I changed it to 7577 (also in the service configuration), but the error still occurs;
  • Downgraded to the version 2.8.2 of skipper, the same in the documentation above, the behaviour was exactly the same.

Increasing the logging level by setting logging.level.org.springframework to DEBUG and then to TRACE didn't result in anything useful showing up in the logs, except a cryptic line which I did not found anywhere on google:

[...]
2022-04-11 15:22:38.818 DEBUG 1 --- [           main] o.s.c.c.CompositeCompatibilityVerifier   : All conditions are passing
2022-04-11 15:22:39.098 DEBUG 1 --- [           main] ocalVariableTableParameterNameDiscoverer : Cannot find '.class' file for class [class org.springframework.statemachine.boot.autoconfigure.StateMachineAutoConfiguration$StateMachineMonitoringConfiguration$$EnhancerBySpringCGLIB$$b266f314] - unable to determine constructor/method parameter names
2022-04-11 15:22:39.925  INFO 1 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 7577 (http) with context path ''
2022-04-11 15:22:40.244  INFO 1 --- [           main] o.s.c.s.s.app.SkipperServerApplication   : Started SkipperServerApplication in 76.267 seconds (JVM running for 79.716)
[...]

Can anyone suggest me what to try next, or give me some way to further diagnosticate this issue?

0

There are 0 answers