We are running into an issue on AWS EKS infrastructure where our application that uses Hazelcast (5.0.2) with the Kubernetes Discovery plugin (2.2.3) fails to detect itself as a member as well as other similar pods - this is part of a 2 pod deployment. Based on the log it seems that Hazelcast Kubernetes plugin is unable to even connect to the Kubernetes API. Instructions found here were followed - https://github.com/hazelcast/hazelcast-kubernetes.

I was not able to find a lot of guidance on what type of Discovery Plugin to use with AWS EKS/EC2 type infrastructure other than what is mentioned here: https://docs.hazelcast.com/imdg/4.2/plugins/cloud-discovery#hazelcast-cloud-discovery-plugins-aws

We had tried using the AWS plugin but based on that page it seems that is only to be used with AWS ECS/EC2 or pure EC2 implementations. Using that plugin did manage to get each pod to start its own node but unable to detect any other nodes - so we reverted back to using the Kubernetes Plugin which is indicated.

The implementation we are migrating to AWS works as expected on baremetal Kubernetes (v 1.18) but seems to fail on AWS EKS (Kubernetes v 1.19).

We are using the service name setting to detect under a specific namespace, together with a service account that is assigned to the 2 pods that has full access to the api and can connect from a pod using wget or curl to get an API REST response with its certificate and token. However it seems there is some sort of SSL handshake error at some point that happens, and discovery/connection to the service fails:

com.hazelcast.spi.exception.RestClientException: Failure in executing REST call Caused by: javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure

The service yaml looks like this:

kind: Service
apiVersion: v1
metadata:
  name: my-service-name
  namespace: my-namespace
spec:
  ports:
    - protocol: TCP
      port: 5701
      targetPort: 5701
  selector:
    app: my-app
  type: ClusterIP

Additional logging enabled did not provide any hints more than what is seen below. Initially it was thought that additional AWS settings (IAM Role/Policy and Security Group settings) may be in play, however a separate system implemented with slightly different versions (Hazelcast 4.2.4 and Kubernetes Discovery plugin 2.2.2) for a different application component, works as expected within a stateful set.

The log shows the following:

[  ] 05-May-2022 06:16:59.918 INFO  o.s.b.w.e.tomcat.TomcatWebServer.initialize 90 - Tomcat initialized with port(s): 8080 (http)
[  ] 05-May-2022 06:16:59.947 INFO  org.apache.juli.logging.DirectJDKLog.log 173 - Starting service [Tomcat]
[  ] 05-May-2022 06:16:59.947 INFO  org.apache.juli.logging.DirectJDKLog.log 173 - Starting Servlet Engine: Apache Tomcat/9.0.13
[  ] 05-May-2022 06:16:59.959 INFO  org.apache.juli.logging.DirectJDKLog.log 173 - The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: [/opt/jdk/lib/server:/opt/jdk/lib:/opt/jdk/../lib:/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib]
[  ] 05-May-2022 06:17:00.046 INFO  org.apache.juli.logging.DirectJDKLog.log 173 - Initializing Spring embedded WebApplicationContext
[  ] 05-May-2022 06:17:00.046 INFO  o.s.b.w.s.c.ServletWebServerApplicationContext.prepareWebApplicationContext 296 - Root WebApplicationContext: initialization completed in 3391 ms
[  ] 05-May-2022 06:17:01.085 WARN  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - Hazelcast is starting in a Java modular environment (Java 9 and newer) but without proper access to required Java packages. Use additional Java arguments to provide Hazelcast access to Java internal API. The internal API access is used to get the best performance results. Arguments to be used:
 --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED
[  ] 05-May-2022 06:17:01.291 INFO  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - [192.168.50.110]:5701 [dev] [5.0.2] 
    +       +  o    o     o     o---o o----o o      o---o     o     o----o o--o--o
    + +   + +  |    |    / \       /  |      |     /         / \    |         |   
    + + + + +  o----o   o   o     o   o----o |    o         o   o   o----o    |   
    + +   + +  |    |  /     \   /    |      |     \       /     \       |    |   
    +       +  o    o o       o o---o o----o o----o o---o o       o o----o    o   
[  ] 05-May-2022 06:17:01.291 INFO  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - [192.168.50.110]:5701 [dev] [5.0.2] Copyright (c) 2008-2021, Hazelcast, Inc. All Rights Reserved.
[  ] 05-May-2022 06:17:01.291 INFO  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - [192.168.50.110]:5701 [dev] [5.0.2] Hazelcast Platform 5.0.2 (20211221 - 18eec9f) starting at [192.168.50.110]:5701
[  ] 05-May-2022 06:17:01.291 INFO  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - [192.168.50.110]:5701 [dev] [5.0.2] Cluster name: dev
[  ] 05-May-2022 06:17:01.291 INFO  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - [192.168.50.110]:5701 [dev] [5.0.2] The Jet engine is disabled.
To enable the Jet engine on the members, please do one of the following:
  - Change member config using Java API: config.getJetConfig().setEnabled(true);
  - Change XML/YAML configuration property: Set hazelcast.jet.enabled to true
  - Add system property: -Dhz.jet.enabled=true
  - Add environment variable: HZ_JET_ENABLED=true
[  ] 05-May-2022 06:17:01.687 INFO  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - [192.168.50.110]:5701 [dev] [5.0.2] Kubernetes Discovery properties: { service-dns: null, service-dns-timeout: 5, service-name: my-service-name, service-port: 0, service-label: null, service-label-value: true, namespace: my-namespace, pod-label: null, pod-label-value: null, resolve-not-ready-addresses: true, expose-externally-mode: AUTO, use-node-name-as-external-address: false, service-per-pod-label: null, service-per-pod-label-value: null, kubernetes-api-retries: 3, kubernetes-master: https://kubernetes.default.svc}
[  ] 05-May-2022 06:17:01.690 INFO  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - [192.168.50.110]:5701 [dev] [5.0.2] Kubernetes Discovery activated with mode: KUBERNETES_API
[  ] 05-May-2022 06:17:01.692 INFO  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - [192.168.50.110]:5701 [dev] [5.0.2] Enable DEBUG/FINE log level for log category com.hazelcast.system.security  or use -Dhazelcast.security.recommendations system property to see ?? security recommendations and the status of current config.
[  ] 05-May-2022 06:17:01.764 INFO  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - [192.168.50.110]:5701 [dev] [5.0.2] Using Discovery SPI
[  ] 05-May-2022 06:17:01.768 WARN  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - [192.168.50.110]:5701 [dev] [5.0.2] CP Subsystem is not enabled. CP data structures will operate in UNSAFE mode! Please note that UNSAFE mode will not provide strong consistency guarantees.
[  ] 05-May-2022 06:17:02.010 INFO  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - [192.168.50.110]:5701 [dev] [5.0.2] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.
[  ] 05-May-2022 06:17:02.016 INFO  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - [192.168.50.110]:5701 [dev] [5.0.2] [192.168.50.110]:5701 is STARTING
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.hazelcast.internal.networking.nio.SelectorOptimizer (jar:file:/service.jar!/BOOT-INF/lib/hazelcast-5.0.2.jar!/) to field sun.nio.ch.SelectorImpl.selectedKeys
WARNING: Please consider reporting this to the maintainers of com.hazelcast.internal.networking.nio.SelectorOptimizer
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
[  ] 05-May-2022 06:17:02.209 WARN  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - Couldn't connect to the service, [1] retrying in 1 seconds...
[  ] 05-May-2022 06:17:03.715 WARN  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - Couldn't connect to the service, [2] retrying in 2 seconds...
[  ] 05-May-2022 06:17:05.969 WARN  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - Couldn't connect to the service, [3] retrying in 3 seconds...
[  ] 05-May-2022 06:17:09.350 INFO  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - [192.168.50.110]:5701 [dev] [5.0.2] Cannot fetch the current zone, ZONE_AWARE feature is disabled
[  ] 05-May-2022 06:17:09.356 WARN  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - Couldn't connect to the service, [1] retrying in 1 seconds...
[  ] 05-May-2022 06:17:10.861 WARN  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - Couldn't connect to the service, [2] retrying in 2 seconds...
[  ] 05-May-2022 06:17:13.117 WARN  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - Couldn't connect to the service, [3] retrying in 3 seconds...
[  ] 05-May-2022 06:17:16.496 WARN  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - [192.168.50.110]:5701 [dev] [5.0.2] Cannot fetch name of the node, NODE_AWARE feature is disabled
[  ] 05-May-2022 06:17:16.499 WARN  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - Couldn't connect to the service, [1] retrying in 1 seconds...
[  ] 05-May-2022 06:17:18.004 WARN  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - Couldn't connect to the service, [2] retrying in 2 seconds...
[  ] 05-May-2022 06:17:20.258 WARN  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - Couldn't connect to the service, [3] retrying in 3 seconds...
[  ] 05-May-2022 06:17:23.641 ERROR c.h.l.StandardLoggerFactory$StandardLogger.log 56 - [192.168.50.110]:5701 [dev] [5.0.2] Failure in executing REST call
com.hazelcast.spi.exception.RestClientException: Failure in executing REST call
    at com.hazelcast.spi.utils.RestClient.call(RestClient.java:163)
    at com.hazelcast.spi.utils.RestClient.lambda$callWithRetries$0(RestClient.java:130)
    at com.hazelcast.spi.utils.RetryUtils.retry(RetryUtils.java:65)
    at com.hazelcast.spi.utils.RetryUtils.retry(RetryUtils.java:51)
    at com.hazelcast.spi.utils.RestClient.callWithRetries(RestClient.java:130)
    at com.hazelcast.spi.utils.RestClient.get(RestClient.java:122)
    at com.hazelcast.kubernetes.KubernetesClient.lambda$callGet$4(KubernetesClient.java:557)
    at com.hazelcast.spi.utils.RetryUtils.retry(RetryUtils.java:65)
    at com.hazelcast.kubernetes.KubernetesClient.callGet(KubernetesClient.java:554)
    at com.hazelcast.kubernetes.KubernetesClient.endpointsByName(KubernetesClient.java:129)
    at com.hazelcast.kubernetes.KubernetesApiEndpointResolver.resolve(KubernetesApiEndpointResolver.java:62)
    at com.hazelcast.kubernetes.HazelcastKubernetesDiscoveryStrategy.discoverNodes(HazelcastKubernetesDiscoveryStrategy.java:136)
    at com.hazelcast.spi.discovery.impl.DefaultDiscoveryService.discoverNodes(DefaultDiscoveryService.java:72)
    at com.hazelcast.internal.cluster.impl.DiscoveryJoiner.getPossibleAddresses(DiscoveryJoiner.java:71)
    at com.hazelcast.internal.cluster.impl.DiscoveryJoiner.getPossibleAddressesForInitialJoin(DiscoveryJoiner.java:60)
    at com.hazelcast.internal.cluster.impl.TcpIpJoiner.joinViaPossibleMembers(TcpIpJoiner.java:135)
    at com.hazelcast.internal.cluster.impl.TcpIpJoiner.doJoin(TcpIpJoiner.java:96)
    at com.hazelcast.internal.cluster.impl.AbstractJoiner.join(AbstractJoiner.java:137)
    at com.hazelcast.instance.impl.Node.join(Node.java:808)
    at com.hazelcast.instance.impl.Node.start(Node.java:470)
    at com.hazelcast.instance.impl.HazelcastInstanceImpl.<init>(HazelcastInstanceImpl.java:124)
    at com.hazelcast.instance.impl.HazelcastInstanceFactory.constructHazelcastInstance(HazelcastInstanceFactory.java:211)
    at com.hazelcast.instance.impl.HazelcastInstanceFactory.newHazelcastInstance(HazelcastInstanceFactory.java:190)
    at com.hazelcast.instance.impl.HazelcastInstanceFactory.newHazelcastInstance(HazelcastInstanceFactory.java:128)
    at com.hazelcast.core.Hazelcast.newHazelcastInstance(Hazelcast.java:61)
    at at.company.product.config.HazelCastConfiguration.hazelcastInstance(HazelCastConfiguration.java:44)
    at at.company.product.config.HazelCastConfiguration$$EnhancerBySpringCGLIB$$8dff12a6.CGLIB$hazelcastInstance$1(<generated>)
    at at.company.product.config.HazelCastConfiguration$$EnhancerBySpringCGLIB$$8dff12a6$$FastClassBySpringCGLIB$$62bdb2d8.invoke(<generated>)
    at org.springframework.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:244)
    at org.springframework.context.annotation.ConfigurationClassEnhancer$BeanMethodInterceptor.intercept(ConfigurationClassEnhancer.java:363)
    at at.company.product.config.HazelCastConfiguration$$EnhancerBySpringCGLIB$$8dff12a6.hazelcastInstance(<generated>)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:154)
    at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:622)
    at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:607)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1305)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1144)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:555)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:515)
    at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:320)
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222)
    at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:318)
    at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:199)
    at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:307)
    at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:199)
    at org.springframework.context.support.AbstractApplicationContext.getBean(AbstractApplicationContext.java:1105)
    at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:867)
    at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:549)
    at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:142)
    at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:775)
    at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:397)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:316)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1260)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1248)
    at at.company.product.ClassServiceApplication.main(ClassServiceApplication.java:28)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48)
    at org.springframework.boot.loader.Launcher.launch(Launcher.java:87)
    at org.springframework.boot.loader.Launcher.launch(Launcher.java:50)
    at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:51)
Caused by: javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure
    at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131)
    at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:117)
    at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:340)
    at java.base/sun.security.ssl.Alert$AlertConsumer.consume(Alert.java:293)
    at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:186)
    at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:172)
    at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1506)
    at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1416)
    at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:456)
    at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:427)
    at java.base/sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:572)
    at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:197)
    at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1592)
    at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1520)
    at java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:527)
    at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:334)
    at com.hazelcast.spi.utils.RestClient.checkResponseCode(RestClient.java:173)
    at com.hazelcast.spi.utils.RestClient.call(RestClient.java:160)
    ... 65 common frames omitted
[  ] 05-May-2022 06:17:23.642 ERROR c.h.l.StandardLoggerFactory$StandardLogger.log 56 - [192.168.50.110]:5701 [dev] [5.0.2] Could not join cluster. Shutting down now!
[  ] 05-May-2022 06:17:23.642 INFO  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - [192.168.50.110]:5701 [dev] [5.0.2] [192.168.50.110]:5701 is SHUTTING_DOWN
[  ] 05-May-2022 06:17:23.645 WARN  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - [192.168.50.110]:5701 [dev] [5.0.2] Terminating forcefully...
[  ] 05-May-2022 06:17:23.645 INFO  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - [192.168.50.110]:5701 [dev] [5.0.2] Shutting down connection manager...
[  ] 05-May-2022 06:17:23.647 INFO  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - [192.168.50.110]:5701 [dev] [5.0.2] Shutting down node engine...
[  ] 05-May-2022 06:17:23.654 INFO  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - [192.168.50.110]:5701 [dev] [5.0.2] Destroying node NodeExtension.
[  ] 05-May-2022 06:17:23.655 INFO  c.h.l.StandardLoggerFactory$StandardLogger.log 56 - [192.168.50.110]:5701 [dev] [5.0.2] Hazelcast Shutdown is completed in 10 ms.

1

There are 1 answers

0
Daniel Mühlbachler-P. On

The hazelcase-kubernetes plugin is deprecated for Hazelcast 5.x and has been merged to be included in Hazelcast directly (https://docs.hazelcast.com/hazelcast/5.0/deploy/deploying-in-kubernetes.html). Hence, your compared application using Hazelcast 4.x is working with the Kubernetes plugin, as you have discovered. With Hazelcast 5.x you must not include the plugin.

From your mentioned configuration, it sounds like you are mixing up the two approaches for discovery through a headless service (via DNS) and through the Kubernetes API.

Please try following only one approach - I recommend the DNS discovery one after checking your posted Service YAML.

Additionally, you might need to tweak and double-check the Hazelcast configuration, if applicable, as per the client documentation when running on Kubernetes. Can you post the Hazelcast configuration used, if it derivates from the example in the documentation?