Docker builds for a monorepo environment

14.5k views Asked by At

Basically, both services foo and bar depend on a common library.

Let's assume that the common package has already been published to the npm registry.

|
├── packages
|    ├── common
|    |    ├── src
|    |    ├── package.json
|    |    ├── tsconfig.build.json
|    |    ├── tsconfig.json
|    ├── foo
|    |    ├── src
|    |    ├── Dockerfile
|    |    ├── package.json
|    |    ├── tsconfig.build.json
|    |    ├── tsconfig.json
|    ├── bar
|    |    ├── src
|    |    ├── Dockerfile
|    |    ├── package.json
|    |    ├── tsconfig.build.json
|    |    ├── tsconfig.json
├── tsconfig.json
├── package.json
├── yarn.lock
├── docker-compose.init.yml
├── docker-compose.yml
├── Dockerfile
├── Dockerfile.init
├── .dockerignore

I've added all devDependencies that are common to all of the packages in the root package.json, as follows:

"scripts": {
  "build": "lerna run build --stream",
  "setup": "yarn && yarn build",
  "docker:bootstrap": "docker-compose --file=docker-compose.init.yml build",
  "docker:up": "docker-compose up --build"
},
"devDependencies": {
  "@nestjs/cli": "^7.5.1",
  "@nestjs/common": "^7.4.4",
  "@nestjs/core": "^7.4.4",
  "@nestjs/platform-express": "^7.4.4",
  "@nestjs/schematics": "^7.1.2",
  "@nestjs/testing": "^7.4.4",
  "@types/express": "^4.17.8",
  "@types/jest": "^26.0.13",
  "@types/node": "^14.10.2",
  "@types/supertest": "^2.0.10",
  "@typescript-eslint/eslint-plugin": "^4.1.1",
  "@typescript-eslint/parser": "^4.1.1",
  "eslint": "^7.9.0",
  "eslint-config-prettier": "^6.11.0",
  "eslint-plugin-prettier": "^3.1.4",
  "express": "^4.17.1",
  "husky": "^4.3.0",
  "jest": "^26.4.2",
  "lerna": "^3.22.1",
  "lint-staged": "^10.4.0",
  "prettier": "^2.1.2",
  "reflect-metadata": "^0.1.13",
  "rimraf": "^3.0.2",
  "rxjs": "^6.6.3",
  "supertest": "^4.0.2",
  "ts-jest": "^26.3.0",
  "ts-loader": "^8.0.3",
  "ts-node": "^9.0.0",
  "typescript": "3.9.5"
}

The foo package needs to use a relational database, so I've installed the following packages independently.

$ yarn workspace foo add @nestjs/typeorm mysql typeorm

To resolve an error message "<package> has unmet peer dependency <package>", I've hit the following command.

$ yarn workspace foo add @nestjs/common @nestjs/core @nestjs/platform-express rxjs

I'm kinda lost here. What's the point of organizing multiple applications in a monorepo manner if I keep repeating myself installing the same packages from one package to another? It makes me even more difficult to write a Dockerfile after all.

My first question is, when a developer working on a monolithic codebase, is this a normal behavior to install libraries to a specific package if necessary?

This is how my Dockerfile looks like:

// docker-compose.init.yml
# This file triggers the initial build
version: "3.8"

services:
  pkg_builder:
    image: pkg-builder
    build:
      context: .
      dockerfile: Dockerfile.init

First, execute the command below.

$ yarn docker:bootstrap

Dockerfile.init creates an initial builder image from which the “real” builder image can copy the build directory.

// Dockerfile.init
FROM scratch

# Copy files from the root to build directory
COPY package.json lerna.json yarn.lock tsconfig.json /build/

# This line is required to install dependencies from foo's package.json
COPY ./packages/foo/package.json /build/packages/foo/package.json

From then on, build the images using the command:

$ yarn docker:up
// docker-compose.yml
version: "3.8"

services:
  pkg_builder:
    image: pkg-builder
    build: .
  mariadb:
    image: mariadb:10.3
    ports:
      - "3306:3306"
    environment:
      - MYSQL_USER=root
      - MYSQL_ROOT_PASSWORD=root
      - MYSQL_DATABASE=tutorial
    restart: always
  foo:
    container_name: foo
    build: ./packages/foo
    ports:
      - "8000:8000"
    depends_on:
      - mariadb
    restart: always
// Dockerfile
FROM node:12-alpine

COPY --from=pkg-builder /build /build

WORKDIR /build

RUN rm -rf node_modules
RUN yarn

CMD ["true"]

The problem is, the size of an image way too big. This is because, all the dev dependencies have been copied from the pkg-builder.

// foo's Dockerfile

FROM node:12-alpine

WORKDIR /app/current

COPY --from=pkg-builder /build/node_modules /app/current/packages/foo/node_modules
COPY --from=pkg-builder /build/tsconfig.json ./tsconfig.json

WORKDIR /app/current/packages/foo

COPY . .

RUN yarn build

EXPOSE 8000

CMD [ "node", "./dist/main" ]

Lastly, how am I supposed to reduce the image? In this case, I believe that multi-stage build is not the right strategy for the size reduction. What am I missing here?

1

There are 1 answers

3
David Maze On

Every package.json file needs to list the complete immediate dependencies of its application. I should be able to check out your source repository, run yarn install, and have a working application tree. Where your question says "and by the way these other dependencies are installed in the environment and I just assume them", that's a problem for anyone who isn't working on the exact system you are, and it's more specifically a problem for Docker and other automated-build tools.

Your library dependencies can have their own library dependencies. These will get listed out in the yarn.lock file, but they don't need to be directly listed in the package.json file.

Taking the database-access libraries as an example: if your main application uses them, they need to be included in your dependencies. But if all of the database access is encapsulated in your common shared library, your applications only need to refer to that library (in foo/package.json), and the library needs to include the database dependencies (in common/package.json).

You should split dependencies from devDependencies. The things you need to run your application (express) need to be listed in dependencies; the things you only need to build your application (eslint) should be devDependencies. You discuss image size; this gives you a way to install a much smaller group of packages in the container when you actually run it.

(Note that Yarn doesn't actually support not installing devDependencies; npm does, though it's otherwise much slower to use.)

Then a multi-stage build can produce a smaller image. The idea here is that the first stage installs the entire dev dependencies, and builds the application; the second stage only includes the runtime dependencies and the built code. This more or less looks like:

ARG node_version=12-current
FROM node:${node_version} AS build
WORKDIR /app
COPY package.json yarn.lock .
RUN yarn install --immutable
COPY . .
RUN yarn build

FROM node:${node_version}
WORKDIR /app
ENV NODE_ENV=production
COPY package.json yarn.lock .
RUN yarn install --immutable
# RUN npm ci  # in production mode, skips devDependencies
COPY --from=build /app/dist dist
CMD ["node", "/app/dist/main.js"]

You shouldn't ever need a "builder container"; the setup you show is basically identical to a multi-stage build, except spread across three separate Dockerfiles. In particular if you have an image that doesn't run a command but just contains files, it's a good candidate to be an early stage in a multi-stage build.