I'm installing AMD/Xilinx Vivado in an Docker image. Even an almost minimal installation is 60 GB in size resulting in a 29GB compressed image. A full installation is something around 150 GB...
The image is created by:
- Using Debian Bookworm (
debian:bookworm-slim
) from Docker Hub - Adding Python 3.12 (actually using
python:3.12-slim
from Docker Hub) - Adding necessary packages needed by AMD Vivado via
apt install ...
. - Installing AMD Vivado as an almost minimal setup via
./xsetup ...
.
When the software is installed, I noticed:
- Upload Problems
dockerd
pushes with around 135 Mb/s in a 1GbE network setup.
- Download Problems
dockerd
is limited to max 3 download threads in parallel. Compared todocker push
, thedocker pull
achieves 975 Mb/s (max speed in 1GbE). See Docker parallel operations limit- The downloaded layer is not extracted on the fly. It needs a full download to extract the layer.
I found hints, that splitting big layers into multiple layers will improve performance. So I used a multi-stage build process where Vivado is installed in stage-1 and then stage-2 is used to mount the layers from stage-1 and a RUN --mount... cp -r SRC DEST
is used to create 15 layers between 1.5 and 8.8 GB.
The results show this:
- For
docker push
dockerd
is limited to max 5 upload threads in parallel.
See Docker parallel operations limit- The parallel upload results in around 550 Mb/s, so 5x the speed of a single upload thread.
- For
docker pull
- A download of one big 60 GB layer is equally slow as a download of 15 layers using 3 parallel download threads. A download takes 5 minutes. This process is limited by the 1GbE network setup.
- A 15 layered 60GB image is fast, because finished layers are extracted by another single (!) thread, while other layers are still downloaded. Overall it took 5 minutes download time + 2 minutes to extract the remaining layers after all layers where downloaded.
- A big layered 60GB image is twice as slow, because it downloads the same data in 5 minutes, but then runs a single threaded extraction which takes 8 minutes. Resulting in a total of 13 vs. 7 minutes.
So here is my question:
- Who to automatically (scripted) split a layer of e.g. 60 GB into smaller layers of lets say 2 to 6 GBs?
I currently thought about these first ideas:
- start the image and execute a script in the container to recursively traverse the Vivado installation directory. By using
du -sh
, a directory is categorized as:- ≥8 GB, start another recursion and spit its content
- ≥1 GB, add the directory or single file to a list of layers.
- else, collect all remaining files/directories of small site in a single layer
- as a result, a list of lists (or dictionary of lists) is created, where the outer level can be iterated to create image layers and the inner level specifies arguments for
RUN ... cp ...
to copy files from a previous stage into a new layer. - as
RUN
commands can't be called in aLOOP
in a docker file, a script is needed to write a secondaryDockerfile
, which contains nRUN
calls.
Manually created Dockerfile to check performance differences:
ARG REGISTRY
ARG NAMESPACE
ARG IMAGE
ARG IMAGE_TAG
ARG VIVADO_VERSION
FROM ${REGISTRY}/${NAMESPACE}/${IMAGE}:${IMAGE_TAG} as base
ARG VIVADO_VERSION
# Install further dependencies for Vivado
RUN --mount=type=bind,target=/context \
apt-get update \
&& xargs -a /context/Vivado.packages apt-get install -y --no-install-recommends \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
FROM base as monolithic
ARG VIVADO_VERSION
RUN --mount=type=bind,target=/context \
--mount=type=bind,from=vivado,target=/Install \
cd /Install; \
./xsetup --batch Install -c /context/Vivado.${VIVADO_VERSION}.cfg --agree XilinxEULA,3rdPartyEULA
FROM base
ARG VIVADO_VERSION
RUN mkdir -p /opt/Xilinx/Vivado/${VIVADO_VERSION}/data/parts/xilinx
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/gnu /opt/Xilinx/Vivado/${VIVADO_VERSION}
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/ids_lite /opt/Xilinx/Vivado/${VIVADO_VERSION}
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/lib /opt/Xilinx/Vivado/${VIVADO_VERSION}
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/tps /opt/Xilinx/Vivado/${VIVADO_VERSION}
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/data/secureip /opt/Xilinx/Vivado/${VIVADO_VERSION}/data
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/data/xsim /opt/Xilinx/Vivado/${VIVADO_VERSION}/data
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/data/deca /opt/Xilinx/Vivado/${VIVADO_VERSION}/data
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/data/ip /opt/Xilinx/Vivado/${VIVADO_VERSION}/data
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/data/simmodels /opt/Xilinx/Vivado/${VIVADO_VERSION}/data
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/data/parts/xilinx/zynquplus /opt/Xilinx/Vivado/${VIVADO_VERSION}/data/parts/xilinx
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/data/parts/xilinx/virtexuplus /opt/Xilinx/Vivado/${VIVADO_VERSION}/data/parts/xilinx
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/data/parts/xilinx/kintexuplus /opt/Xilinx/Vivado/${VIVADO_VERSION}/data/parts/xilinx
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/data/parts/xilinx/common /opt/Xilinx/Vivado/${VIVADO_VERSION}/data/parts/xilinx
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -ru /Install/* /opt/Xilinx/Vivado/${VIVADO_VERSION}
# Configure Vivado tools
COPY FlexLM.config.sh Vivado.config.sh /tools/GitLab-CI-Scripts/
To automate the process of splitting a large Docker layer into smaller layers, you can create a Python script that traverses the Vivado installation directory, categorizes directories based on their size, and then generates a Dockerfile with multiple
RUN
commands to copy these directories into separate layers.And the
VivadoDockerLayerSplitter.py
script would be:That should help in automating the process of splitting the large Docker layer into more manageable sizes, potentially improving both upload and download performance.
For each identified directory or file that needs to be in a separate layer, the script writes a distinct RUN command in the generated Dockerfile. That would circumvent the Dockerfile limitation by pre-processing and preparing all the necessary commands before the
docker build
process even starts.Those
RUN
commands are intended for use in the second stage of the build process. They copy different portions of the Vivado installation from the first stage (where it is installed) to the second stage, which creates multiple layers.The script creates a Dockerfile, used in the second stage of the multi-stage build process, where it will perform the layer-splitting actions as part of the Docker build.