I've been struggling with pulling a private image from my GitLab container registry when running a DockerOperator in Airflow 2.0.
My DockerOperator looks as follows:
python_mailer = DockerOperator(
task_id='mailer',
image='registry.gitlab.com/private422/mailer/image',
docker_conn_id='gitlab-registry',
api_version='auto',
dag=dag
)
The gitlab-registry
is defined in Airflow's connections with the username and password from a token that I created in GitLab:
However, when I try to run my DAG, I get the following error:
[2022-04-07 15:27:38,562] {base.py:74} INFO - Using connection to: id: gitlab-registry. Host: registry.gitlab.com, Port: None, Schema: , Login: gitlab+deploy-token-938603, Password: XXXXXXXX, extra: None
[2022-04-07 15:27:38,574] {taskinstance.py:1455} ERROR - Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 677, in urlopen
chunked=chunked,
File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 392, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/local/lib/python3.6/http/client.py", line 1291, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/lib/python3.6/http/client.py", line 1337, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.6/http/client.py", line 1286, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.6/http/client.py", line 1046, in _send_output
self.send(msg)
File "/usr/local/lib/python3.6/http/client.py", line 984, in send
self.connect()
File "/home/airflow/.local/lib/python3.6/site-packages/docker/transport/unixconn.py", line 43, in connect
sock.connect(self.unix_socket)
FileNotFoundError: [Errno 2] No such file or directory
Does anyone have a clue what this could be about?
Note: I run Airflow locally.
Note the error in the docker library:
This means that the docker client is unable to connect to the docker daemon (by default, on unix socket
/var/run/docker.sock
)The root cause of this issue is that you are running Airflow inside a docker container. In order for Airflow to invoke
docker
properly, it needs to communicate with a docker daemon, which won't be available/usable inside the container by default, even if you install docker in the container.You'll notice that
docker info
fails inside the container:There's a couple approaches that you can do to solve this:
Use the host docker daemon
In order for the docker daemon on the host to be usable from inside another container you need (at least) two things
/var/run/docker.sock
into the container (-v /var/run/docker.sock:/var/run/docker.sock)
--privileged
)After doing this,
docker info
should correctly report the server information as the host daemon.Use a remote docker daemon
Use docker's remote APIs to have Airflow connect. For example, you can have docker running on a remote system available over the network and connect to that daemon remotely. You'll want to do this in a secure manner, like using SSH to connect to the daemon.
Setup a "remote" daemon locally in docker-compose
A way you can do this entirely locally would be by adding a
docker:dind
container to your compose section and then settingDOCKER_HOST
in the airflow container to point to the dind container. TheDOCKER_HOST
environment variable tells docker to use a remote daemon instead of the default.This is not necessarily the most secure setup, but it should be the simplest to implement.
In your
DockerOperator
invocation, also need to provide thedocker_url
argument and setmount_tmp_dir
toFalse
: