Apache Camel 4.1.0 Kubernetes Clustering - Leader election hanging after upgrade

106 views Asked by At

I am upgrading an application which uses Camel leader election in Kubernetes. https://camel.apache.org/manual/clustering.html

Upgrading from Spring boot 2 and camel version 3.0 to Spring boot 3 and camel version 4.1.0.

Leader election was working successfully prior to the library upgrade. As part of the upgrade i did have to change some settings, new values in use:

  • name: camel.cluster.kubernetes.enabled value: "true"

  • name: camel.cluster.kubernetes.config-map-name value: "my-leaders"

  • name: camel.cluster.kubernetes.retry-period-millis value: "15000"

  • name: camel.cluster.kubernetes.lease-duration-millis value: "21000"

  • name: camel.cluster.kubernetes.renew-deadline-millis value: "19000" New permissions also added to allow access to leases, test with all permissions added.

    Local testing using a file works, the leader election appears to be hanging however when run on Kubernetes. {"@timestamp":"2023-11-09T12:12:23.725588521Z","@version":"1","message":"Pod[mycomp-my-connector-5c68db96c5-pn5zw] Trying to acquire the leadership...","logger_name":"org.apache.camel.component.kubernetes.cluster.lock.KubernetesLeadershipController","thread_name":"Camel (camel-1) thread #1 - CamelKubernetesLeadershipController","level":"DEBUG","level_value":10000,"appname":"mycomp-my-connector"} {"@timestamp":"2023-11-09T12:12:23.72578713Z","@version":"1","message":"Pod[mycomp-my-connector-5c68db96c5-pn5zw] Lock lease resource is not present in the Kubernetes namespace. A new lease resource will be created","logger_name":"org.apache.camel.component.kubernetes.cluster.lock.KubernetesLeadershipController","thread_name":"Camel (camel-1) thread #1 - CamelKubernetesLeadershipController","level":"DEBUG","level_value":10000,"appname":"mycomp-my-connector"}

Adding extra logging to the Camel Kubernetes component shows it appearing to hang creating the lease object. Splitting out the lease building code into multiple lines and added logging shows it hanging at the newLeaseBuilder1.withName(name) line.

@Override
public Lease createNewLeaseResource(KubernetesClient client, String namespace, String prefix, LeaderInfo leaderInfo) {
    LOG.info("LIAM createNewLeaseResource: client:{} namespace:{} prefix:{} leaderInfo:{}", client, namespace, prefix, leaderInfo);
    ZonedDateTime now = ZonedDateTime.now();
    String name = leaseResourceName(prefix, leaderInfo.getGroupName());
    String holderIdentity = leaderInfo.getLeader();
    Integer leaseDurationSeconds = leaderInfo.getLeaseDurationSeconds();
    LOG.info("LIAM createNewLeaseResource: now:{} name:{} holderIdentity:{} leaseDurationSeconds:{}", now, name, holderIdentity, leaseDurationSeconds);

    LOG.info("LIAM createNewLeaseResource:1");
    var newLeaseBuilder1 = new LeaseBuilder().withNewMetadata();
    LOG.info("LIAM createNewLeaseResource:3");
    // hangs here with current code as is
    var newLeaseBuilder3 = newLeaseBuilder1.withName(name);
    LOG.info("LIAM createNewLeaseResource:4");
    var newLeaseBuilder4 = newLeaseBuilder3.addToLabels("provider", "camel");
    LOG.info("LIAM createNewLeaseResource:5");
    var newLeaseBuilder5 = newLeaseBuilder4.endMetadata();
    LOG.info("LIAM createNewLeaseResource:6");
    var newLeaseBuilder6 = newLeaseBuilder5.withNewSpec();
    LOG.info("LIAM createNewLeaseResource:7");
    var newLeaseBuilder7 = newLeaseBuilder6.withHolderIdentity(holderIdentity);
    LOG.info("LIAM createNewLeaseResource:8");
    var newLeaseBuilder8 = newLeaseBuilder7.withAcquireTime(now);
    LOG.info("LIAM createNewLeaseResource:9");
    var newLeaseBuilder9 = newLeaseBuilder8.withLeaseDurationSeconds(leaseDurationSeconds);
    LOG.info("LIAM createNewLeaseResource:10");
    var newLeaseBuilder10 = newLeaseBuilder9.withRenewTime(now);
    LOG.info("LIAM createNewLeaseResource:11");
    var newLeaseBuilder11 = newLeaseBuilder10.endSpec();

    LOG.info("LIAM newLeaseBuilder: created,calling newLeaseBuilder.build();");
    Lease newLease = newLeaseBuilder11.build();
    LOG.info("LIAM newLease:{}", newLease);
    return client.leases()
            .inNamespace(namespace)
            .resource(newLease)
            .create();
}

{"@timestamp":"2023-11-10T14:27:45.553432553Z","@version":"1","message":"LIAM createNewLeaseResource: client:io.fabric8.kubernetes.client.impl.KubernetesClientImpl@449ef704 namespace:default prefix:my-leaders leaderInfo:LeaderInfo{groupName='MYGROUP', leader='mycomp-my-connector-5c68db96c5-pn5zw', localTimestamp=Fri Nov 10 14:27:45 UTC 2023, members=[mycomp-otherpod-84cb8f7476-d58qv,  mycomp-my-connector-5c68db96c5-pn5zw], leaseDurationSeconds=60}","logger_name":"org.apache.camel.component.kubernetes.cluster.lock.impl.NativeLeaseResourceManager","thread_name":"Camel (camel-1) thread #1 - CamelKubernetesLeadershipController","level":"INFO","level_value":20000,"appname":"mycomp-my-connector"}
{"@timestamp":"2023-11-10T14:27:45.554658633Z","@version":"1","message":"LIAM createNewLeaseResource: now:2023-11-10T14:27:45.554508823Z[Etc/UTC] name:my-leaders-user-service-presence holderIdentity:mycomp-my-connector-5c68db96c5-pn5zw leaseDurationSeconds:60","logger_name":"org.apache.camel.component.kubernetes.cluster.lock.impl.NativeLeaseResourceManager","thread_name":"Camel (camel-1) thread #1 - CamelKubernetesLeadershipController","level":"INFO","level_value":20000,"appname":"mycomp-my-connector"}
{"@timestamp":"2023-11-10T14:27:45.554859826Z","@version":"1","message":"LIAM createNewLeaseResource:1","logger_name":"org.apache.camel.component.kubernetes.cluster.lock.impl.NativeLeaseResourceManager","thread_name":"Camel (camel-1) thread #1 - CamelKubernetesLeadershipController","level":"INFO","level_value":20000,"appname":"mycomp-my-connector"}
{"@timestamp":"2023-11-10T14:27:45.563145956Z","@version":"1","message":"LIAM createNewLeaseResource:2","logger_name":"org.apache.camel.component.kubernetes.cluster.lock.impl.NativeLeaseResourceManager","thread_name":"Camel (camel-1) thread #1 - CamelKubernetesLeadershipController","level":"INFO","level_value":20000,"appname":"mycomp-my-connector"}
{"@timestamp":"2023-11-10T14:27:45.570490974Z","@version":"1","message":"LIAM createNewLeaseResource:3","logger_name":"org.apache.camel.component.kubernetes.cluster.lock.impl.NativeLeaseResourceManager","thread_name":"Camel (camel-1) thread #1 - CamelKubernetesLeadershipController","level":"INFO","level_value":20000,"appname":"mycomp-my-connector"}

No more camel logs are outputted after the above, it just appears to hang.

Is there some setting i am missing or has anyone come across something like this before ??

tried changing setting values and giving all lease permissions, no difference observed.

0

There are 0 answers