Docker swarm node unable to detect service from another host in swarm

1.7k views Asked by At

My goal is to set up a docker swarm on a group of 3 linux (ubuntu) physical workstations and run a dask cluster on that.

$ docker --version
Docker version 17.06.0-ce, build 02c1d87

I am able to init the docker swarm and add all of the machines to the swarm.

cordoba$ docker node ls
ID                            HOSTNAME    STATUS    AVAILABILITY MANAGER STATUS
j8k3hm87w1vxizfv7f1bu3nfg     box1        Ready     Active              
twg112y4m5tkeyi5s5vtlgrap     box2        Ready     Active              
upkr459m75au0vnq64v5k5euh *   box3        Ready     Active              Leader

I then run docker stack deploy -c docker-compose.yml dask-cluster on the Leader box.

Here is docker-compose.yml:

version: "3"

services:

  dscheduler:
    image: richardbrks/dask-cluster
    ports:
     - "8786:8786"
     - "9786:9786"
     - "8787:8787"
    command: dask-scheduler
    networks:
      - distributed
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      placement:
        constraints: [node.role == manager]

  dworker:
    image: richardbrks/dask-cluster
    command: dask-worker dscheduler:8786
    environment:
      - "affinity:container!=dworker*"
    networks:
      - distributed
    depends_on:
      - dscheduler
    deploy:
      replicas: 3
      restart_policy:
        condition: on-failure

networks:
  distributed:

and here is richardbrks/dask-cluster:

# Official python base image
FROM python:2.7    
# update apt-repository
RUN apt-get update
# only install enough library to run dask on a cluster (with monitoring)
RUN pip install --no-cache-dir \
    psutil \
    dask[complete]==0.15.2 \
    bokeh

When I deploy the swarm, the dworker nodes that are not on the same machine as dscheduler does not know what dscheduler is. I ssh'd into one of these nodes and looked in env, and dscheduler was not there. I also tried to ping dscheduler, and got "ping: unknown host".

I thought docker was supposed to provide an internal dns based for service discovery so that calling dscheduler will take me to the address of the dschedler node.

Is there some set up to my computers that I am missing? or are any of my files missing something?

All of this code is also located in https://github.com/MentalMasochist/dask-swarm

2

There are 2 answers

0
Rich On BEST ANSWER

There was nothing wrong with dask or docker swarm. The problem was bad router firmware. After I went back to a prior version of the router firmware, the cluster worked fine.

11
herm On

According to this issue in swarm:

Because of some networking limitations (I think related to virtual IPs), the ping tool will not work with overlay networking. Are you service names resolvable with other tools like dig?

Personally I could always connect from one service to the other using curl. Your setup seems correct and your services should be able to communicate.


FYI depends on is not supported in swarm


Update 2: I think you are not using the port. Servicename is no replacement for the port. You need to use the port as the container knows it internally.