I'm attempting to set up a production-ready cluster on AWS that uses Jeff Lindsay's progrium/docker-consul
image to install Consul
on each host but can't get the secondary and tertiary servers to -join
the initial server.
I've followed the Running a real consul cluster in production instructions but I'm getting an i/o timeout
error when my consul2
and consul3
nodes attempt to -join
the consul1
private IP.
The Instances
I spun up three t2.micros
on AWS and got the following Private IP's assigned in my VPC:
172.31.4.194 (Intended to be `consul1`, leader)
172.31.4.195 (Intended to be `consul2`)
172.31.4.193 (Intended to be `consul3`)
Starting Up The Initial Consul Server Instance
My consul1
node gets itself up and waiting for the other two just fine:
sudo docker run -d -h consul1 --name consul1 -v /mnt:/data \
-p 172.31.4.194:8300:8300 \
-p 172.31.4.194:8301:8301 \
-p 172.31.4.194:8301:8301/udp \
-p 172.31.4.194:8302:8302 \
-p 172.31.4.194:8302:8302/udp \
-p 172.31.4.194:8400:8400 \
-p 172.31.4.194:8500:8500 \
-p 172.17.42.1:53:53/udp \
progrium/consul -server -advertise 172.31.4.194-bootstrap-expect 3
Attempting To run
The Second Server Instance
But then when I attempt to start my consul2
node, using the following:
sudo docker run -d -h consul2 --name consul2 -v /mnt:/data \
-p 172.31.4.195:8300:8300 \
-p 172.31.4.195:8301:8301 \
-p 172.31.4.195:8301:8301/udp \
-p 172.31.4.195:8302:8302 \
-p 172.31.4.195:8302:8302/udp \
-p 172.31.4.195:8400:8400 \
-p 172.31.4.195:8500:8500 \
-p 172.17.42.1:53:53/udp \
progrium/consul -server -advertise 172.31.4.195 -join 172.31.4.194
The Error
Here's the error that I'm getting:
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting raft data migration...
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Joining cluster...
==> dial tcp 172.31.4.194:8301: i/o timeout
Any idea of what could be causing this? I've re-attempted about nine times and still no luck. It did spur me to do some more learning about networking (which is a broad, deep, fascinating subject) but I can't figure out if there's an issue in my config, or if this is an actual bug.
Thanks in advance for any help.
Problem Solved!
Turns out that I had forgotten to open up the ports that
Consul
needs to use in the Security Group that governs access to the instances.Opened up
8300
,8301
,8302
,8400
, and8500
and all installed just fine.