Build docker image in gitlab ci with kaniko in local docker setup: suddenly cannot access container registry

223 views Asked by At

Setup

I'm running Gitlab (Omnibus) on a local server with docker compose (as described here). Gitlab is globally reachable behind an Nginx reverse proxy (is a docker container as well: nginxproxy/nginx-proxy) through a domain (for which a valid SSL certificate is provided).

I set up the Container Registry (CR) to store docker images. The CR is bind to a specific port of the host and is reachable from the internet as well. The CR listens to an internal port of the container. I mapped the container port to the host port as usual. Thus, the CR is not behind the Nginx proxy. Therefore, I set the following in the docker-compose.yml file:

...
services:
  gitlab:
    ...
    environment:
      VIRTUAL_HOST: 'global.gitlab.domain'
      VIRTUAL_PORT: '80'
      VIRTUAL_NETWORK: 'nginx_proxy_network'
      ...
      GITLAB_OMNIBUS_CONFIG: |
        ...
        registry_external_url 'https://global.gitlab.domain:registry_host_port'
        registry_nginx['listen_port'] = registry_container_port
        ...
    ...
      ports:
        - "registry_host_port:registry_container_port"

This setup works well for pushing and pulling images to/from the CR.

But now...

The Error

I also set up a CI pipeline that builds docker images by utilizing kaniko (as described here). This worked fine for at least a year (I updated the gitlab and runner images once a week), and suddenly the jobs that build images and pushes them to the CR started failing with the following error:

$ /kaniko/executor --verbosity=debug --context "${CI_PROJECT_DIR}" --dockerfile "${CI_PROJECT_DIR}/Dockerfile" --destination "${CI_REGISTRY_IMAGE}:latest" --destination "${CI_REGISTRY_IMAGE}:${CI_COMMIT_TAG}"
DEBU[0000] Copying file /builds/group/repo/Dockerfile to /kaniko/Dockerfile 
error checking push permissions -- make sure you entered the correct tag name, and that you are authenticated correctly, and try again: checking push permission for "global.gitlab.domain:registry_host_port/group/repo:latest": creating push check transport for global.gitlab.domain:registry_host_port failed: Get "global.gitlab.domain:registry_host_port/v2/": dial tcp 172.22.0.3:registry_host_port: connect: connection refused
ERROR: Job failed: exit code 1

172.22.0.3 is the IP of the gitlab service/container.

The above error shows, that kaniko tries to connect to the IP of the Gitlab container (172.22.0.3) and the registry_host_port. As I mentioned above, the container is listening to another port and this port (registry_container_port) is mapped to the host port. Thus, it's not surprisingly that this connection is refused.

The Questions

First, let me explain the above "suddenly", when the jobs started failing ;) I have tracked down the changes that happened to the server between the last successful (27.11.) job and the first erroneous job (04.12.). The most suspicious change was that I migrated from docker-compose (v1) to docker compose (v2).

  1. Why does kaniko try to access the IP of the Gitlab container directly instead of accessing the CR via the domain?
  2. Could this be due to the migration from docker-compose (v1) to docker-compose (v2)?
  3. Has the local Docker DNS behaviour changed recently?
  4. Has anyone else experienced this error?

Additional Setup Information

The runner itself is a container as well (gitlab/gitlab-runner) and resides on the same docker network as the Gitlab container (gitlab_intern). This is his config.toml:

...
[[runners]]
  name = "docker-01"
  url = "http://gitlab"
  id = 0
  token = "XYZ"
  token_obtained_at = 0001-01-01T00:00:00Z
  token_expires_at = 0001-01-01T00:00:00Z
  executor = "docker"
  clone_url = "http://gitlab"
  [runners.docker]
    tls_verify = false
    image = "alpine:latest"
    privileged = false
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    network_mode = "gitlab_intern"
    shm_size = 0
0

There are 0 answers