I set a GitHub actions to build and run tests on my Docker image.
While building the Docker image, a rather heavy file (6Gb model from Huggingface) is being downloaded. (As a side note - I can possibly bring it down to 2Gb because the weights are downloaded three times for a silly reason.)
I thought I could speed up things by using the gha
cache. The cache works, but it's much much slower.
Here's the gist of my setup
Non-caching GitHub action
- name: Build Image
shell: bash
run: |
docker buildx build -t ${IMAGE_TAG} -f ./Dockerfile .
Takes 3m 23s
Caching GitHub action
- name: Build Image
uses: docker/build-push-action@v5
with:
push: false # do not push to remote registry
load: true # keep in local Docker registry
context: .
file: ./Dockerfile
tags: ${{env.IMAGE_TAG}}
cache-from: type=gha
cache-to: type=gha,mode=max
Takes 7m 58s (initial build when performing first commit with new setup was 12m 53s).
Downloading 6Gb from huggingface takes about 30s. Downloading a 4Gb image from GitHub itself takes 279s.
Is this a GitHub problem? Is there any way to get around it?
This may be related to this question.
EDIT: apparently I'm not the only one suffering from this - issue on Docker's GitHub
From the issue you mention, it seems you would need to wait for a docker buildx release to fix this.
Check if you were using the buildx v0.12.0, which might help.
In the meantime (pending a new buildx release), you would need to avoid the
--load
option, since it is causing significant slowdowns.And consider using the GitHub Container Registry for certain operations if it offers better performance.
You might consider using a local registry for caching: that approach involves pushing the built image to a local registry in one job and pulling it from there in subsequent jobs. That can be faster than using GitHub's cache system and avoids the performance issues with
--load
.In your first job, start a local Docker registry container.
Then you would build your Docker image and push it to the local registry:
In subsequent jobs, pull the image from the local registry.
Another approach:
Directly saving and loading images. Instead of using the
--load
option, you can save the Docker image as a tarball and cache the tarball using GitHub's caching. In the subsequent job, you can restore the tarball from the cache and load it into Docker. That method might be more efficient than using thedocker-container
driver.Build your Docker image and save it as a tarball.
Use GitHub Actions' cache to store the tarball.
Restore the tarball from the cache and load it into Docker.
True, both the methods essentially involve saving and loading a TAR file, which is similar in concept to what the
--load
option does in Docker. However, the key difference lies in how and where the TAR file is managed:Local registry approach involves pushing and pulling the image to/from a local Docker registry running within the GitHub Actions environment. It avoids some of the I/O overhead associated with
--load
by directly interacting with a local registry.Direct TAR file caching uses GitHub's own caching mechanism to store and retrieve the TAR file. While it still involves building and saving the image as a TAR file, it potentially offers more control over the caching process compared to using
--load
.In both cases, the goal is to bypass some of the performance bottlenecks associated with the
docker-container
driver and--load
, but they still fundamentally rely on a similar mechanism of saving and loading Docker images.