Why does docker-compose build run my steps twice?

917 views Asked by At

I'm using multi-stage building with a Dockerfile like this:

#####################################
## Build the client
#####################################

FROM node:12.19.0 as web-client-builder
WORKDIR /workspace
COPY web-client/package*.json ./

# Running npm install before we update our source allows us to take advantage
# of docker layer caching. We are excluding node_modules in .dockerignore
RUN npm ci

COPY web-client/ ./
RUN npm run test:ci
RUN npm run build


#####################################
## Host the client on a static server
#####################################

FROM nginx:1.19 as web-client
COPY --from=web-client-builder /workspace/nginx-templates /etc/nginx/templates/
COPY --from=web-client-builder /workspace/nginx.conf /etc/nginx/nginx.conf
COPY --from=web-client-builder /workspace/build /var/www/

#####################################
## Build the server
#####################################

FROM openjdk:11-jdk-slim as server-builder
WORKDIR /workspace
COPY build.gradle settings.gradle gradlew ./
COPY gradle ./gradle
COPY server/ ./server/
RUN ./gradlew --no-daemon :server:build

#####################################
## Start the server
#####################################

FROM openjdk:11-jdk-slim as server
WORKDIR /app
ARG JAR_FILE=build/libs/*.jar
COPY --from=server-builder /workspace/server/$JAR_FILE ./app.jar
ENTRYPOINT ["java","-jar","/app/app.jar"]

I also have a docker-compose.yml like this:

version: "3.8"
services:
  server:
    restart: always
    container_name: server
    build:
      context: .
      dockerfile: Dockerfile
      target: server
    image: server
    ports:
      - "8090:8080"
  web-client:
    restart: always
    container_name: web-client
    build:
      context: .
      dockerfile: Dockerfile
      target: web-client
    image: web-client
    environment:
      - LISTEN_PORT=80
    ports:
      - "8091:80"

The two images involved here, web-client and server are completely independent. I'd like to take advantage of multi-stage build parallelization.

When I run docker-compose build (I'm on docker-compose 1.27.4), I get output like this

λ docker-compose build
Building server
Step 1/24 : FROM node:12.19.0 as web-client-builder
 ---> 1f560ce4ce7e
... etc ...
Step 6/24 : RUN npm run test:ci
 ---> Running in e9189b2bff1d
... Runs tests ...
... etc ...
Step 24/24 : ENTRYPOINT ["java","-jar","/app/app.jar"]
 ---> Using cache
 ---> 2ebe48e3b06e

Successfully built 2ebe48e3b06e
Successfully tagged server:latest
Building web-client
Step 1/11 : FROM node:12.19.0 as web-client-builder
 ---> 1f560ce4ce7e
... etc ...
Step 6/11 : RUN npm run test:ci
 ---> Using cache
 ---> 0f205b9549e0
... etc ...
Step 11/11 : COPY --from=web-client-builder /workspace/build /var/www/
 ---> Using cache
 ---> 31c4eac8c06e

Successfully built 31c4eac8c06e
Successfully tagged web-client:latest

Notice that my tests (npm run test:ci) run twice (Step 6/24 for the server target and then again at Step 6/11 for the web-client target). I'd like to understand why this is happening, but I guess it's not a huge problem, because at least it's cached by the time it gets around to the tests the second time.

Where this gets to be a bigger problem is when I try to run my build in parallel. Now I get output like this:

λ docker-compose build --parallel
Building server     ...
Building web-client ...
Building server
Building web-client
Step 1/11 : FROM node:12.19.0 as web-client-builderStep 1/24 : FROM node:12.19.0 as web-client-builder
 ---> 1f560ce4ce7e
... etc ...
Step 6/24 : RUN npm run test:ci
 ---> e96afb9c14bf
Step 6/11 : RUN npm run test:ci
 ---> Running in c17deba3c318
 ---> Running in 9b0faf487a7d

> [email protected] test:ci /workspace
> react-scripts test --ci --coverage --reporters=default --reporters=jest-junit --watchAll=false


> [email protected] test:ci /workspace
> react-scripts test --ci --coverage --reporters=default --reporters=jest-junit --watchAll=false
... Now my tests run in parallel twice, and the output is interleaved for both parallel runs ...

It's clear that the tests are running twice now, because now that I'm running the builds in parallel, there's no chance for them to cache.

Can anyone help me understand this? I thought that one of the high points of docker multi-stage builds was that they were parallelizable, but this behavior doesn't make sense to me. What am I misunderstanding?

Note I also tried enabling BuildKit for docker-compose. I had a harder time making sense of the output. I don't believe it was running things twice, but I'm also not sure that it was parallelizing. I need to dig more into it, but my main question stands: I'm hoping to understand why multi-stage builds don't run in parallel in the way I expected without BuildKit.

2

There are 2 answers

1
David Maze On BEST ANSWER

You can split this into two separate Dockerfiles. I might write a web-client/Dockerfile containing the first two stages (changing the relative COPY paths to ./), and leave the root-directory Dockerfile to build the server application. Then your docker-compose.yml file can point at these separate directories:

services:
  server:
    build: . # equivalent to {context: ., dockerfile: Dockerfile}
  web-client:
    build: web-client

As @Stefano notes in their answer, multi-stage builds are more optimized around building a single final image, and in the "classic" builder they always run from the beginning up through the named target stage without any particular logic for where to start.

0
Stefano On

why multi-stage builds don't run in parallel in the way I expected without BuildKit.

That's the high point of BuildKit.

The main purpose of the multistage in Docker is to produce smaller images by keeping only what's required by the application to properly work. e.g.

FROM node as builder

COPY package.json package-lock.json
RUN npm ci

COPY . /app

RUN npm run build

FROM nginx
COPY --from=/app/dist --chown=nginx /app/dist /var/www

All the development tools required for building the project are simply not copied into the final image. This translates into smaller final images.


EDIT:

From the BuildKit documentation:

BuildKit builds are based on a binary intermediate format called LLB that is used for defining the dependency graph for processes running part of your build. tl;dr: LLB is to Dockerfile what LLVM IR is to C.

In other words, BuildKit is able to evaluate the dependencies for each stage allowing parallel execution.