I decided to try upgrade the current cluster from ES2.1.1 to ES2.2.0.
A mirror pair. The cluster is running within AWS, so I'm using the cloud-aws
plugin for communication.
I successfully upgraded the first node, and it has assumed master status, but I have encountered a strange communication/authentication issue when upgrading the second node.
I paid attention to the guidelines here, but I still seem to be experiencing a strange issue.
From main cluster log on 2nd node:
[2016-02-03 12:29:41,241][INFO ][discovery.ec2 ] [Sharon Ventura] failed to send join request to master [{Space Phantom}{NzN7b7ZHT8uPu6oXJAORMg}{10.60.164.147}{10.60.164.147:9300}], reason [RemoteTransportException[[Space Phantom][10.60.164.147:9300][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: RemoteTransportException[[Sharon Ventura][10.60.163.74:9300][internal:discovery/zen/join/validate]]; nested: ElasticsearchSecurityException[missing authentication token for action [internal:discovery/zen/join/validate]]; ]
[2016-02-03 12:29:42,455][DEBUG][action.admin.cluster.health] [Sharon Ventura] no known master node, scheduling a retry
[2016-02-03 12:29:44,255][INFO ][discovery.ec2 ] [Sharon Ventura] failed to send join request to master [{Space Phantom}{NzN7b7ZHT8uPu6oXJAORMg}{10.60.164.147}{10.60.164.147:9300}], reason [RemoteTransportException[[Space Phantom][10.60.164.147:9300][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: RemoteTransportException[[Sharon Ventura][10.60.163.74:9300][internal:discovery/zen/join/validate]]; nested: ElasticsearchSecurityException[missing authentication token for action [internal:discovery/zen/join/validate]]; ]
[2016-02-03 12:29:47,269][INFO ][discovery.ec2 ] [Sharon Ventura] failed to send join request to master [{Space Phantom}{NzN7b7ZHT8uPu6oXJAORMg}{10.60.164.147}{10.60.164.147:9300}], reason [RemoteTransportException[[Space Phantom][10.60.164.147:9300][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: RemoteTransportException[[Sharon Ventura][10.60.163.74:9300][internal:discovery/zen/join/validate]]; nested: ElasticsearchSecurityException[missing authentication token for action [internal:discovery/zen/join/validate]]; ]
[2016-02-03 12:29:49,472][DEBUG][action.admin.cluster.state] [Sharon Ventura] timed out while retrying [cluster:monitor/state] after failure (timeout [30s])
[2016-02-03 12:29:49,473][INFO ][rest.suppressed ] /_cluster/settings Params: {}
MasterNotDiscoveredException[null]
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$5.onTimeout(TransportMasterNodeAction.java:205)
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:239)
at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:794)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2016-02-03 12:29:50,283][INFO ][discovery.ec2 ] [Sharon Ventura] failed to send join request to master [{Space Phantom}{NzN7b7ZHT8uPu6oXJAORMg}{10.60.164.147}{10.60.164.147:9300}], reason [RemoteTransportException[[Space Phantom][10.60.164.147:9300][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: RemoteTransportException[[Sharon Ventura][10.60.163.74:9300][internal:discovery/zen/join/validate]]; nested: ElasticsearchSecurityException[missing authentication token for action [internal:discovery/zen/join/validate]]; ]
My elasticsearch.yml file:
cluster.name: cluster01
http.cors.enabled: true
network.host: 0.0.0.0
discovery.type: ec2
discovery.ec2.tag.project_code_info: "cluster01"
cloud.aws.region: eu-central-1
I can see in the logs that it has detected the 1st node: [Space Phantom][10.60.164.147:9300]
It has detected it without any intervention, but it apparently cannot authenticate.
I suspect this may be related to the Shield
plugin, which is installed also, but the correct and identical permissions are setup the same as before. Nothing else has changed.
I'm using a username and password in shield, no SSL configured.
Can anyone assist?
I managed to figure it out, as @user3458016 requested.
I managed to resolve this issue, by (on all nodes) resetting all settings and configurations, removing plugins
license
,shield
, removing all users and re-adding all of them as before. These configurations were identical to begin with, so this is odd.First, stop elasticsearch on all nodes. Stop kibana if running locally.
If you have any custom roles, check the configuration of this in
/etc/elasticsearch/shield/roles.yml
refresh this from a single recorded configuration if possible.remove plugins:
/usr/share/elasticsearch/bin/plugin remove elasticsearch/license/latest
/usr/share/elasticsearch/bin/plugin remove elasticsearch/shield/latest
remove users:
/usr/share/elasticsearch/bin/shield/esusers userdel admin
/usr/share/elasticsearch/bin/shield/esusers userdel logstash
re-add plugins:
/usr/share/elasticsearch/bin/plugin install elasticsearch/license/latest -b
/usr/share/elasticsearch/bin/plugin install elasticsearch/shield/latest -b
re-add users:
/usr/share/elasticsearch/bin/shield/esusers useradd admin -p adminuserpw -r admin
/usr/share/elasticsearch/bin/shield/esusers useradd logstash -p logstashuserpw -r logstash
If you have any custom roles, double-check the configuration of this in
/etc/elasticsearch/shield/roles.yml
to verify the configuration is not been modified or over-written.Start elasticsearch on first node. Start kibana if running locally.
Check indices have come up correctly and verify master node status.
Do all the above steps on all other nodes.
Start elasticsearch on remaining nodes, one at a time. Verify healthy cluster replication before starting next node.
I hope someone finds this useful.