i/o timeout error using progrium/docker-consul

1.3k views Asked by At

I'm attempting to set up a production-ready cluster on AWS that uses Jeff Lindsay's progrium/docker-consul image to install Consul on each host but can't get the secondary and tertiary servers to -join the initial server.

I've followed the Running a real consul cluster in production instructions but I'm getting an i/o timeout error when my consul2 and consul3 nodes attempt to -join the consul1 private IP.

The Instances

I spun up three t2.micros on AWS and got the following Private IP's assigned in my VPC:

172.31.4.194 (Intended to be `consul1`, leader)
172.31.4.195 (Intended to be `consul2`)
172.31.4.193 (Intended to be `consul3`)



Starting Up The Initial Consul Server Instance

My consul1 node gets itself up and waiting for the other two just fine:

sudo docker run -d -h consul1 --name consul1 -v /mnt:/data \
    -p 172.31.4.194:8300:8300 \
    -p 172.31.4.194:8301:8301 \
    -p 172.31.4.194:8301:8301/udp \
    -p 172.31.4.194:8302:8302 \
    -p 172.31.4.194:8302:8302/udp \
    -p 172.31.4.194:8400:8400 \
    -p 172.31.4.194:8500:8500 \
    -p 172.17.42.1:53:53/udp \
    progrium/consul -server -advertise 172.31.4.194-bootstrap-expect 3



Attempting To run The Second Server Instance

But then when I attempt to start my consul2 node, using the following:

sudo docker run -d -h consul2 --name consul2 -v /mnt:/data \
    -p 172.31.4.195:8300:8300 \
    -p 172.31.4.195:8301:8301 \
    -p 172.31.4.195:8301:8301/udp \
    -p 172.31.4.195:8302:8302 \
    -p 172.31.4.195:8302:8302/udp \
    -p 172.31.4.195:8400:8400 \
    -p 172.31.4.195:8500:8500 \
    -p 172.17.42.1:53:53/udp \
    progrium/consul -server -advertise 172.31.4.195 -join 172.31.4.194



The Error

Here's the error that I'm getting:

==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting raft data migration...
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Joining cluster...
==> dial tcp 172.31.4.194:8301: i/o timeout



Any idea of what could be causing this? I've re-attempted about nine times and still no luck. It did spur me to do some more learning about networking (which is a broad, deep, fascinating subject) but I can't figure out if there's an issue in my config, or if this is an actual bug.

Thanks in advance for any help.

1

There are 1 answers

1
AJB On

Problem Solved!

Turns out that I had forgotten to open up the ports that Consul needs to use in the Security Group that governs access to the instances.

Opened up 8300,8301,8302,8400, and 8500 and all installed just fine.