Reuse a conda environment within the same gitlab CI pipeline?

1.8k views Asked by At

Motivation

My main goal is this: Within a pipeline, I would like to reuse as much as possible (i.e. not build the conda environment multiple times, if all jobs share the same environment).

In my project, I use conda as depency manager and gitlab ci/cd for continuous integration. For the sake of simplicity, let's say I have a build job and a test job. The most straight forward approach would be to create the conda environment from the environment.yml in any job and then do the actual work. This adds an overhead of several minutes to any job. It also seems like overhead to me, since I would like to build the environment once in the build job and then use it in my test job (especially when creating multiple jobs for different tests).

Research Results

The first thing I need to do is to set the CONDA_ENVS_PATH to somewhere in my project directory.

I've looked at gitlab's caching mechanism, but found that it only helps for the same job in repeated runs of the same pipeline, but not not for different jobs of the same run within a pipeline.

I've also looked at gitlab's artifacts mechanism, but found that due the up- and download of those, they don't increase run time significantly (basically I only save time by not downloading many small packages and not having to compile them again, but loose time by compressing and decompressing them).

I've also tried to make use of the GIT_CLEAN_FLAGS by setting them to none in my test job. That way, the conda environment is not deleted when getting the latest data from git. This does cause a serious speedup in my pipelines, but it does not work all the time. Some jobs fail, not finding the conda environment. A simple rerun does magically work however. Of course, in a CI / CD setting, this nondeterminism is not practical.

As a workaround to the original question, we've come up with an intermediate solution. We introduce a docker image, holding our custom environment, through a minimal Dockerfile and a couple of changes to our .gitlab-ci.yml (see below for an example). By only executing the job that builds our custom docker image when the dockerfile or environment changed, we save valuable time on each run. At the same time, we keep the full flexibility in our environment definition and can adjust it exactly how we usually would: by changing the environment.yml.

Question

All solutions tried so far are not really satisfactory. Thus my question is: How can my test job reuse the same conda environment as my build job in gitlab-ci?


In case someone else would like to use a similar setup: Here is my current approach:

# Dockerfile

FROM continuumio/miniconda3:latest

COPY environment.yml .

RUN conda env create -f environment.yml

ENTRYPOINT [""]
# .gitlab-ci.yml

# Use the latest version of this project's docker file
# This will be the default image for all jobs unless specified otherwise
image: $CI_REGISTRY_IMAGE:latest


# Change cache directories to be inside the project directory since we can
# only cache local items.
variables:
  PRE_COMMIT_HOME: "${CI_PROJECT_DIR}/.cache/pre-commit"


stages:
  - build
  - test


# Make conda environment available to all jobs
# This expects the conda environment to have the same name as the gitlab project path
# Avoid dashes and other non-alphabetical characters
default:
  before_script:
    - source activate "${CI_PROJECT_NAME}"

# Build the docker image including the correct conda environment for subsequent jobs
# This assumes a docker image registry being configured for your gitlab instance
dockerimage:
  stage: build
  image:
    name: gcr.io/kaniko-project/executor:debug
    entrypoint: [""]
  rules:
    - changes:
      - Dockerfile
      - environment.yml
  before_script: [ ]
  script:
    - mkdir -p /kaniko/.docker
    - echo "{\"auths\":{\"$CI_REGISTRY\":{\"username\":\"$CI_REGISTRY_USER\",\"password\":\"$CI_REGISTRY_PASSWORD\"}}}" > /kaniko/.docker/config.json
    - /kaniko/executor --context $CI_PROJECT_DIR --dockerfile $CI_PROJECT_DIR/Dockerfile --destination $CI_REGISTRY_IMAGE:latest


# Run pytest
pytest:
  stage: test
  script:
    - conda install pytest-cov
    - pytest tests --cov=src

Edit: I replaced the example code for using GIT_CLEAN_FLAGS with our most recent approach: using a custom docker image.


Disclaimer: I saw this and this question, but they are both dated, don't have a satisfying answer and I only found them after writing this question, so I hope my additional question increases discoverability of the topic.

0

There are 0 answers