Upgrading Elasticsearch 2.1.1 to 2.2.0 - missing authentication token?

841 views Asked by At

I decided to try upgrade the current cluster from ES2.1.1 to ES2.2.0. A mirror pair. The cluster is running within AWS, so I'm using the cloud-aws plugin for communication.

I successfully upgraded the first node, and it has assumed master status, but I have encountered a strange communication/authentication issue when upgrading the second node.

I paid attention to the guidelines here, but I still seem to be experiencing a strange issue.

From main cluster log on 2nd node:

[2016-02-03 12:29:41,241][INFO ][discovery.ec2            ] [Sharon Ventura] failed to send join request to master [{Space Phantom}{NzN7b7ZHT8uPu6oXJAORMg}{}{}], reason [RemoteTransportException[[Space Phantom][][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: RemoteTransportException[[Sharon Ventura][][internal:discovery/zen/join/validate]]; nested: ElasticsearchSecurityException[missing authentication token for action [internal:discovery/zen/join/validate]]; ]
[2016-02-03 12:29:42,455][DEBUG][action.admin.cluster.health] [Sharon Ventura] no known master node, scheduling a retry
[2016-02-03 12:29:44,255][INFO ][discovery.ec2            ] [Sharon Ventura] failed to send join request to master [{Space Phantom}{NzN7b7ZHT8uPu6oXJAORMg}{}{}], reason [RemoteTransportException[[Space Phantom][][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: RemoteTransportException[[Sharon Ventura][][internal:discovery/zen/join/validate]]; nested: ElasticsearchSecurityException[missing authentication token for action [internal:discovery/zen/join/validate]]; ]
[2016-02-03 12:29:47,269][INFO ][discovery.ec2            ] [Sharon Ventura] failed to send join request to master [{Space Phantom}{NzN7b7ZHT8uPu6oXJAORMg}{}{}], reason [RemoteTransportException[[Space Phantom][][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: RemoteTransportException[[Sharon Ventura][][internal:discovery/zen/join/validate]]; nested: ElasticsearchSecurityException[missing authentication token for action [internal:discovery/zen/join/validate]]; ]
[2016-02-03 12:29:49,472][DEBUG][action.admin.cluster.state] [Sharon Ventura] timed out while retrying [cluster:monitor/state] after failure (timeout [30s])
[2016-02-03 12:29:49,473][INFO ][rest.suppressed          ] /_cluster/settings Params: {}
        at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$5.onTimeout(TransportMasterNodeAction.java:205)
        at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:239)
        at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:794)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
[2016-02-03 12:29:50,283][INFO ][discovery.ec2            ] [Sharon Ventura] failed to send join request to master [{Space Phantom}{NzN7b7ZHT8uPu6oXJAORMg}{}{}], reason [RemoteTransportException[[Space Phantom][][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: RemoteTransportException[[Sharon Ventura][][internal:discovery/zen/join/validate]]; nested: ElasticsearchSecurityException[missing authentication token for action [internal:discovery/zen/join/validate]]; ]

My elasticsearch.yml file:

cluster.name: cluster01
http.cors.enabled: true
discovery.type: ec2
discovery.ec2.tag.project_code_info: "cluster01"
cloud.aws.region: eu-central-1

I can see in the logs that it has detected the 1st node: [Space Phantom][] It has detected it without any intervention, but it apparently cannot authenticate.

I suspect this may be related to the Shield plugin, which is installed also, but the correct and identical permissions are setup the same as before. Nothing else has changed.

I'm using a username and password in shield, no SSL configured.

Can anyone assist?


There are 1 answers


I managed to figure it out, as @user3458016 requested.

I managed to resolve this issue, by (on all nodes) resetting all settings and configurations, removing plugins license, shield, removing all users and re-adding all of them as before. These configurations were identical to begin with, so this is odd.

First, stop elasticsearch on all nodes. Stop kibana if running locally.

If you have any custom roles, check the configuration of this in /etc/elasticsearch/shield/roles.yml refresh this from a single recorded configuration if possible.

remove plugins:

/usr/share/elasticsearch/bin/plugin remove elasticsearch/license/latest /usr/share/elasticsearch/bin/plugin remove elasticsearch/shield/latest

remove users:

/usr/share/elasticsearch/bin/shield/esusers userdel admin /usr/share/elasticsearch/bin/shield/esusers userdel logstash

re-add plugins:

/usr/share/elasticsearch/bin/plugin install elasticsearch/license/latest -b /usr/share/elasticsearch/bin/plugin install elasticsearch/shield/latest -b

re-add users:

/usr/share/elasticsearch/bin/shield/esusers useradd admin -p adminuserpw -r admin /usr/share/elasticsearch/bin/shield/esusers useradd logstash -p logstashuserpw -r logstash

If you have any custom roles, double-check the configuration of this in /etc/elasticsearch/shield/roles.yml to verify the configuration is not been modified or over-written.

Start elasticsearch on first node. Start kibana if running locally.

Check indices have come up correctly and verify master node status.

Do all the above steps on all other nodes.

Start elasticsearch on remaining nodes, one at a time. Verify healthy cluster replication before starting next node.

I hope someone finds this useful.