Elastic search multi node cluster using Podman container NoRouteToHostException error

175 views Asked by At

I have installed 2 ES podman containers on 2 different servers(primary and secondary)

Primary elasticsearch.yml config:

network.host: 0.0.0.0
cluster.name: xyz-es
node.name: node-primary  # On the primary, use 'node-primary', and 'node-secondary' on the secondary
path.data: /usr/share/elasticsearch/data
discovery.seed_hosts: ["172.16.211.99", "172.18.205.99"] # ["ip_of_primary", "ip_of_secondary"]
cluster.initial_master_nodes: ["node-primary", "node-secondary"]
xpack.security.enabled: false

Secondary elasticsearch.yml config

network.host: 0.0.0.0
cluster.name: xyz-es
node.name: node-secondary  # On the primary, use 'node-primary', and 'node-secondary' on the secondary
path.data: /usr/share/elasticsearch/data
discovery.seed_hosts: ["172.16.211.99", "172.18.205.99"] # ["ip_of_primary", "ip_of_secondary"]
cluster.initial_master_nodes: ["node-primary", "node-secondary"]
xpack.security.enabled: false

However , the secondary are unable to copy the data from primary server, when checking the logs the secondary Elasticsearch node is able to complete the initial handshake with the primary node, but then the follow-up connection fails with a NoRouteToHostException error.

{"@timestamp":"2023-10-26T08:36:24.295Z", "log.level": "WARN", "message":"address [172.16.211.99:9300], node [null], requesting [false] discovery result: [node-primary][10.89.0.9:9300] connect_exception: Failed execution: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: No route to host: 10.89.0.9/10.89.0.9:9300: No route to host: 10.89.0.9/10.89.0.9:9300: No route to host", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[node-secondary][generic][T#3]","log.logger":"org.elasticsearch.discovery.PeerFinder","elasticsearch.node.name":"node-secondary","elasticsearch.cluster.name":"xyz-es"}
{"@timestamp":"2023-10-26T09:19:32.071Z", "log.level": "WARN", "message":"completed handshake with [{node-primary}{BGp7fx9kTR6Fh6m1pCan2g}{JfFpk5WNSgG0ayvMP8uB8g}{node-primary}{10.89.0.9}{10.89.0.9:9300}{cdfhilmrstw}{8.10.2}{7000099-8100299}] at [172.16.211.99:9300] but followup connection to [10.89.0.9:9300] failed", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[node-secondary][generic][T#3]","log.logger":"org.elasticsearch.discovery.HandshakingTransportAddressConnector","elasticsearch.node.name":"node-secondary","elasticsearch.cluster.name":"xyz-es","error.type":"org.elasticsearch.transport.ConnectTransportException","error.message":"[node-primary][10.89.0.9:9300] connect_exception","error.stack_trace":"org.elasticsearch.transport.ConnectTransportException: [node-primary][10.89.0.9:9300] connect_exception\n\tat [email protected]/org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1154)\n\tat [email protected]/org.elasticsearch.action.support.SubscribableListener$FailureResult.complete(SubscribableListener.java:285)\n\tat [email protected]/org.elasticsearch.action.support.SubscribableListener.tryComplete(SubscribableListener.java:197)\n\tat [email protected]/org.elasticsearch.action.support.SubscribableListener.setResult(SubscribableListener.java:222)\n\tat [email protected]/org.elasticsearch.action.support.SubscribableListener.onFailure(SubscribableListener.java:141)\n\tat [email protected]/org.elasticsearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:62)\n\tat [email protected]/io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)\n\tat [email protected]/io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583)\n\tat [email protected]/io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559)\n\tat [email protected]/io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)\n\tat [email protected]/io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)\n\tat [email protected]/io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:629)\n\tat [email protected]/io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:118)\n\tat [email protected]/io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321)\n\tat [email protected]/io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337)\n\tat [email protected]/io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776)\n\tat [email protected]/io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689)\n\tat [email protected]/io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652)\n\tat [email protected]/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)\n\tat [email protected]/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)\n\tat [email protected]/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat java.base/java.lang.Thread.run(Thread.java:1623)\nCaused by: org.elasticsearch.common.util.concurrent.UncategorizedExecutionException: Failed execution\n\tat [email protected]/org.elasticsearch.action.support.SubscribableListener.wrapAsExecutionException(SubscribableListener.java:178)\n\tat [email protected]/org.elasticsearch.common.util.concurrent.ListenableFuture.wrapException(ListenableFuture.java:38)\n\tat [email protected]/org.elasticsearch.common.util.concurrent.ListenableFuture.wrapException(ListenableFuture.java:27)\n\t... 18 more\nCaused by: java.util.concurrent.ExecutionException: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: No route to host: 10.89.0.9/10.89.0.9:9300\n\t... 21 more\nCaused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: No route to host: 10.89.0.9/10.89.0.9:9300\nCaused by: java.net.NoRouteToHostException: No route to host\n\tat java.base/sun.nio.ch.Net.pollConnect(Native Method)\n\tat java.base/sun.nio.ch.Net.pollConnectNow(Net.java:673)\n\tat java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:973)\n\tat [email protected]/io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)\n\tat [email protected]/io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)\n\tat [email protected]/io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776)\n\tat [email protected]/io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689)\n\tat [email protected]/io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652)\n\tat [email protected]/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)\n\tat [email protected]/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)\n\tat [email protected]/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat java.base/java.lang.Thread.run(Thread.java:1623)\n"}

if you look at the logs, you can see that the secondary server will contact primary and primary will respond with the container internal IP which is not reachable over the network

the Kube.yml:

# Save the output of this file and use kubectl create -f to import
# it into Kubernetes.
#
# Created with podman-4.4.1
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2023-10-09T06:58:43Z"
  labels:
    app: xyz-elascitsearch
  name: xyz-elascitsearch
spec:
  containers:
  - args:
    - eswrapper
    image: docker.elastic.co/elasticsearch/elasticsearch:8.10.2
    name: engine
    ports:
    - containerPort: 9200
      hostPort: 9200
    - containerPort: 9300
      hostPort: 9300
    resources: {}
    securityContext: {}
    volumeMounts:
    - mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
      name: home-xyz-podman-projects-xyz-elascitsearch-config-elasticsearch.yml-host-0
    - mountPath: /usr/share/elasticsearch/data
      name: home-xyz-podman-projects-xyz-elascitsearch-data-host-1
  hostname: xyz-elascitsearch
  restartPolicy: Never
  volumes:
  - hostPath:
      path: /home/xyz/podman/projects/xyz-elascitsearch/config/elasticsearch.yml
      type: File
    name: home-xyz-podman-projects-xyz-elascitsearch-config-elasticsearch.yml-host-0
  - hostPath:
      path: /home/xyz/podman/projects/xyz-elascitsearch/data
      type: Directory
    name: home-xyz-podman-projects-xyz-elascitsearch-data-host-1
status: {}

How to enforce the secondary node to use the seed_hosts IP and not the respond IP ? or is there another configuration i am missing , appreciate anyone help.

1

There are 1 answers

0
user1968211 On BEST ANSWER

Found the solution:

Add the publish_host to the config to let other nodes what IP to call and not default to the host which in this case was localhost defaulting to the container IP

network.host: 0.0.0.0
network.publish_host: 172.16.211.99
cluster.name: xyz-es
node.name: node-primary  # On the primary, use 'node-primary', and 'node-secondary' on the secondary
path.data: /usr/share/elasticsearch/data
discovery.seed_hosts: ["172.16.211.99", "172.18.205.99"] # ["ip_of_primary", "ip_of_secondary"]
cluster.initial_master_nodes: ["node-primary", "node-secondary"]
xpack.security.enabled: false