Error when pulling private docker image from GitLab container registry using DockerOperator in Airflow 2.0

662 views Asked by At

I've been struggling with pulling a private image from my GitLab container registry when running a DockerOperator in Airflow 2.0.

My DockerOperator looks as follows:

python_mailer = DockerOperator(
   task_id='mailer',
   image='registry.gitlab.com/private422/mailer/image',
   docker_conn_id='gitlab-registry',
   api_version='auto',
   dag=dag
)

The gitlab-registry is defined in Airflow's connections with the username and password from a token that I created in GitLab:

GitLab token

However, when I try to run my DAG, I get the following error:

[2022-04-07 15:27:38,562] {base.py:74} INFO - Using connection to: id: gitlab-registry. Host: registry.gitlab.com, Port: None, Schema: , Login: gitlab+deploy-token-938603, Password: XXXXXXXX, extra: None
[2022-04-07 15:27:38,574] {taskinstance.py:1455} ERROR - Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 677, in urlopen
    chunked=chunked,
  File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 392, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/lib/python3.6/http/client.py", line 1291, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.6/http/client.py", line 1337, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.6/http/client.py", line 1286, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.6/http/client.py", line 1046, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.6/http/client.py", line 984, in send
    self.connect()
  File "/home/airflow/.local/lib/python3.6/site-packages/docker/transport/unixconn.py", line 43, in connect
    sock.connect(self.unix_socket)
FileNotFoundError: [Errno 2] No such file or directory

Does anyone have a clue what this could be about?

Note: I run Airflow locally.

1

There are 1 answers

0
sytech On

Note the error in the docker library:

  File "/home/airflow/.local/lib/python3.6/site-packages/docker/transport/unixconn.py", line 43, in connect
    sock.connect(self.unix_socket)

This means that the docker client is unable to connect to the docker daemon (by default, on unix socket /var/run/docker.sock)

The root cause of this issue is that you are running Airflow inside a docker container. In order for Airflow to invoke docker properly, it needs to communicate with a docker daemon, which won't be available/usable inside the container by default, even if you install docker in the container.

You'll notice that docker info fails inside the container:

docker exec -it airflow docker info
Client: Context: default 
...
Server: ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

There's a couple approaches that you can do to solve this:

Use the host docker daemon

In order for the docker daemon on the host to be usable from inside another container you need (at least) two things

  1. Mount /var/run/docker.sock into the container (-v /var/run/docker.sock:/var/run/docker.sock)
  2. Run the container in privileged mode (--privileged)

After doing this, docker info should correctly report the server information as the host daemon.

Use a remote docker daemon

Use docker's remote APIs to have Airflow connect. For example, you can have docker running on a remote system available over the network and connect to that daemon remotely. You'll want to do this in a secure manner, like using SSH to connect to the daemon.

Setup a "remote" daemon locally in docker-compose

A way you can do this entirely locally would be by adding a docker:dind container to your compose section and then setting DOCKER_HOST in the airflow container to point to the dind container. The DOCKER_HOST environment variable tells docker to use a remote daemon instead of the default.

This is not necessarily the most secure setup, but it should be the simplest to implement.

version: "3.8"

services:
  docker:
    image: docker:dind
    privileged: true
    environment:
      DOCKER_TLS_CERTDIR: ""
  airflow:
    # ... docker client should be installed in this image
    environment:
      DOCKER_HOST: "tcp://docker:2375"
    depends_on: [docker]

In your DockerOperator invocation, also need to provide the docker_url argument and set mount_tmp_dir to False:

python_mailer = DockerOperator(
   docker_url="tcp://docker:2375",
   mount_tmp_dir=False,
   # ... other options
)