Dears,
I am getting an issue regards to Erlang cluster. After a long time my cluster working, one day, I can't make any connection more to a specific node ([email protected]) in the cluster, net_adm:ping([email protected]) returns a pang answer. Even using:
erlang -name [email protected] -setcookie MYCOOKIE -remsh [email protected]
return a failure result too.
The strange is, the [email protected] is working well to other nodes in the cluster. The problem just has happened when a new node joining to the cluster and ping to SickNode.
There isn't any firewall here because all nodes are working well within the cluster. Is there anybody has got this bad situation? Erlang is not stable for cluster using?
PS: I am using Erlang/OTP 20 with Centos 6.8
Many Thanks!!!
Not a straight up answer, but a theory and a way to reproduce your issue. It's complicated because it involves multiple nodes, but let's see if you can follow me.
TL;DR: [email protected] changed its cookie after it was connected to the cluster.
So, this is what I did… First, on a terminal I started
node1
with cookiex
…Then, on another terminal I started
node2
with cookiex
, connected it tonode1
and changed its cookie toy
…Then, in yet another terminal I started
node3
with cookiex
and pingednode1
(which resulted in a connection attempt tonode2
as well, as you will see below) and then explicitely tried to connect tonode2
…What happened so far? Well, since
node1
's cookie wasx
andnode3
's cookie wasx
as well, they could connect.node2
was still connected tonode1
but, since the cookie there wasy
,node3
could not connect to it.Erlang tries to establish a fully connected mesh of nodes, so when you connect to one of them, it automatically tries to connect you to all the others.
But I wanted to be thorough so I pinged
node2
fromnode3
and, as expected I got apang
. Also, these messages popped up onnode2
:And, of course, when I tried to ping
node3
fromnode2
…But… if I try to ping
node1
…That's because they're already connected and Erlang only validates the sharing of the cookie on the initial handshake.
Finally, if I try to ping nodes from
node1
, I get the expected results…Hope this helps.