How to debug what is causing layers built from the same codebase on different workstations to be different?

468 views Asked by At

In our development workflow we build images, push them to a registry, and then deploy services from them in a staging cluster. The workflow is severely bogged down by huge sizes of image pushes, because layers built from the exact same codebase on different workstations tend to end up having different hashes. We do understand how docker works (i.e. one bit changes, the layer changes; a layer changes, all subsequent layers also change), but we still believe that there is a lot of layer invalidation going on that isn't explainable by anything we do to our codebase or dependencies, and is exclusively due to the builds being performed on different machines. Our builds aren't terribly platform dependent in principle (we don't compile anything to machine code), and the machines are all x86_64 linux boxes anyway.

What are the tools, strategies and best practices that would help us debug why this is happening and possibly alleviate the situation?

(Important: one known best practice that we currently absolutely cannot afford is moving the build process to a single dedicated machine, possibly in the cloud. Please don't suggest this solution).

2

There are 2 answers

0
theUndying On

You can use a tool such as Dive (https://github.com/wagoodman/dive/) in order to inspect the layers.

Apart from that - can't help much unless I see the Dockerfiles.

Good practice for me - use Docker inside Docker in order to build your images. Usually a similar flow could suffice:

  1. Create a docker:dind container
  2. Clone your repository in it
  3. Build your project
  4. Run docker build (in my Dockerfile I make sure that only files that are needed during runtime are copied - you can look for multistage Dockerfiles as well)

The general idea is to always start on a "clean plate" and once you're done - destroy "the plate" and repeat for the next build.

0
hakre On

What are the tools, strategies and best practices that would help us debug why this is happening and possibly alleviate the situation?

You've already named a couple of things your're taken care of which do point to this already:

What you want is a reproducible build. That is, the same VCS revision is creating the same image (given the baseimage is also the same).

  • Check the base-images in use are stable (e.g. you can pin them and they don't change in every build, just when it is the intend to change them).
  • Timestamps. Check the files you put in are not only binary same as in the repo, but also for the meta-data.
  • Timestampls. Inside the docker container. Freeze now for build. No idea, this highly depends on what is done in the build.

You can verify things early by comparing tarballs. E.g. export a tarball from VCS to build from, export it later from an image and see what changed. Comparing list of files with tar is easy and you normally keep the artifacts, so it is also easy to compare multiple runs/revisions/builds.

I don't know of any specific tooling for Docker / in Docker that is specifically for reproducible builds, so I can't make any suggestions here.

I'd guess the build tool from Google should support reproducible builds, but I don't know if. I do know that they have a build tool (published).