I have 3 containers running on 3 machines. One is called graphite, one is called back and one is called front. The front container needs both the others to run, so i link them separately like this:
[Unit]
Description=front hystrix
[Service]
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker kill front
ExecStartPre=-/usr/bin/docker rm -v front
ExecStartPre=/usr/bin/docker pull blurio/hystrixfront
ExecStart=/usr/bin/docker run --name front --link graphite:graphite --link back:back -p 8080:8080 blurio/hystrixfront
ExecStop=/usr/bin/docker stop front
I start both the other containers, wait till they're up and running, then start this one with fleetctl and it just instantly fails with this message:
fleetctl status front.service
? front.service - front hystrix
Loaded: loaded (/run/fleet/units/front.service; linked-runtime; vendor preset: disabled)
Active: failed (Result: exit-code) since Tue 2015-05-12 13:46:08 UTC; 24s ago
Process: 922 ExecStop=/usr/bin/docker stop front (code=exited, status=0/SUCCESS)
Process: 912 ExecStart=/usr/bin/docker run --name front --link graphite:graphite --link back:back -p 8080:8080 blurio/hystrixfront (code=exited, status=1/FAILURE)
Process: 902 ExecStartPre=/usr/bin/docker pull blurio/hystrixfront (code=exited, status=0/SUCCESS)
Process: 892 ExecStartPre=/usr/bin/docker rm -v front (code=exited, status=1/FAILURE)
Process: 885 ExecStartPre=/usr/bin/docker kill front (code=exited, status=1/FAILURE)
Main PID: 912 (code=exited, status=1/FAILURE)
May 12 13:46:08 core-04 docker[902]: 8b9853c10955: Download complete
May 12 13:46:08 core-04 docker[902]: 0dc7a355f916: Download complete
May 12 13:46:08 core-04 docker[902]: 0dc7a355f916: Download complete
May 12 13:46:08 core-04 docker[902]: Status: Image is up to date for blurio/hystrixfront:latest
May 12 13:46:08 core-04 systemd[1]: Started front hystrix.
May 12 13:46:08 core-04 docker[912]: time="2015-05-12T13:46:08Z" level="fatal" msg="Error response from daemon: Could not get container for graphite"
May 12 13:46:08 core-04 systemd[1]: front.service: main process exited, code=exited, status=1/FAILURE
May 12 13:46:08 core-04 docker[922]: front
May 12 13:46:08 core-04 systemd[1]: Unit front.service entered failed state.
May 12 13:46:08 core-04 systemd[1]: front.service failed.
I also want to include the fleetctl list-units output, where you can see that the other two are running without problems.
fleetctl list-units
UNIT MACHINE ACTIVE SUB
back.service 0ff08b11.../172.17.8.103 active running
front.service 69ab2600.../172.17.8.104 failed failed
graphite.service 2886cedd.../172.17.8.101 active running
there are a couple issues here. first, you can't use the --link argument for docker. this is a docker specific instruction for linking one container to another on the same docker engine. in your example, you have multiple engines, so this technique won't work. If you want to use that technique, you will need to employ the ambassador pattern: coreos ambassador, either that, you you can use the X-Fleet directive MachineOf: to make all of the docker containers run on the same machine, however, I think that would defeat your goals.
Often with cloud services one service needs another, like in your case. If the other service is not running (yet), then the services that need it should be well behaved and either exit, or wait for the needed service to be ready. So the needed service must be discovered. There are many techniques for the discovery phase, and the waiting phase. For example, you can write a 'wrapper' script in each of your containers. That wrapper can do these duties. In your case, you could have a script in the back.service and graphite.service which writes information to the etcd database, like:
Then in the startup script in front you can do a etcdctl get /graphite/status to see when the container becomes ready (and not continue until it is). If you like you can store the ip address and port in the graphite script so that the front script can pick up the place to connect to.
Another technique for discovery is to use registrator. This is a super handy docker container that updates a directory structure in etcd everytime a container comes and goes. This makes it easier to use a discovery technique like I listed above without having each container having to announce itself, it becomes automatic. You still need the 'front' container to have a startup script that waits for the service to appear in the etcd database. I usually start registrator on coreos boot. In fact, I start two copies, one for discovering internal addresses (flannel ones) and one for external (services that are available outside my containers). Here is an example of the database registrator manages on my machines:
core@fo1 ~/prs $ etcdctl ls --recursive /skydns /skydns/net /skydns/net/tacodata /skydns/net/tacodata/services /skydns/net/tacodata/services/cadvisor-4194 /skydns/net/tacodata/services/cadvisor-4194/fo2:cadvisor:4194 /skydns/net/tacodata/services/cadvisor-4194/fo1:cadvisor:4194 /skydns/net/tacodata/services/cadvisor-4194/fo3:cadvisor:4194 /skydns/net/tacodata/services/internal /skydns/net/tacodata/services/internal/cadvisor-4194 /skydns/net/tacodata/services/internal/cadvisor-4194/fo2:cadvisor:4194 /skydns/net/tacodata/services/internal/cadvisor-4194/fo1:cadvisor:4194 /skydns/net/tacodata/services/internal/cadvisor-4194/fo3:cadvisor:4194 /skydns/net/tacodata/services/internal/cadvisor-8080 /skydns/net/tacodata/services/internal/cadvisor-8080/fo2:cadvisor:8080 /skydns/net/tacodata/services/internal/cadvisor-8080/fo1:cadvisor:8080 /skydns/net/tacodata/services/internal/cadvisor-8080/fo3:cadvisor:8080
You can see the internal and external available ports for cadvisor. If I get one of the records:
you get everything you need to connect to that container internally. This technique really starts to shine when coupled with skydns. Skydns presents a dns service using the information presented by registrator. So, long story short, I can simply make my application use the hostname (the hostname defaults to be the name of the docker image, but it can be changed). So in this example here my application can connect to cadvisor-8080, and dns will give it one of the 3 ip addresses it has (it is on 3 machines). The dns also supports srv records, so, if you aren't using a well know port the srv record can give you the port number.
Using coreos and fleet it is difficult not to get the containers themselves involved in the publish/discovery/wait game. At least that's been my experience.
-g