How to enable / setup Dependency Caches for apt-get on BitBucket Pipelines

8.7k views Asked by At

I am using the following code in my bitbucket-pipelines.yml files to remotely deply code to a staging server.

image: php:7.1.1

pipelines:
  default:
    - step:
        script:
          # install ssh
          - apt-get update && apt-get install -y openssh-client
          # get the latest code
          - ssh [email protected] -F ~/.ssh/config "cd /path/to/code && git pull"
          # update composer
          - ssh [email protected] -F ~/.ssh/config "cd /path/to/code && composer update --no-scripts"
          # optimise files
          - ssh [email protected] -F ~/.ssh/config "cd /path/to/code && php artisan optimize"

This all works, except that each time the pipeline is run, the ssh client is downloaded and installed everything (adding ~30 seconds to the build time). Is there way I can cache this step?

And how can I go about caching the apt-get step?

For example, would something like this work (or what changes are needed to make the following work):

pipelines:
  default:
    - step:
        caches:
          - aptget
        script:
          - apt-get update && apt-get install -y openssh-client

definitions:
  caches:
    aptget: which ssh
4

There are 4 answers

2
BlueM On BEST ANSWER

This is a typical scenario where you should use your own Docker image instead of one of the ones provided by Atlassian. (Or search for a Docker image which provides exactly this.)

In your simple case, this Dockerfile should be enough:

FROM php:7.1.1

RUN apt-get update && \
    apt-get install -y openssh-client

Then, create a DockerHub account, publish the image and reference it in bitbucket-pipelines.yml.

1
OrangeDog On

Unfortunately, the parts that take the time are unsafe or pointless to cache. Remember that the pipeline caches may be deleted at any time, so you always need to run the commands anyway.

apt-get update doesn't use a cache, so will download the latest indexes every time.

apt-get install caches downloaded packages in /var/cache/apt so you could save that. However this probably won't actually save any time

Fetched 907 kB in 0s (998 kB/s)

The actual installed packages cannot be cached, because they a) are spread around multiple shared files and directories and b) may not be portable to different docker images.

At a deeper level, satisfactory interaction between caching, apt-get update, and Docker is a complex issue.

0
N1ngu On

TL;DR

  1. Avoid apt. Use apt-get
  2. rm /etc/apt/apt.conf.d/docker-clean
  3. Cache /var/lib/apt/lists/
  4. Cache /var/cache/apt/
image: debian

definitions:

  caches:
    apt-lists: /var/lib/apt/lists
    apt-cache: /var/cache/apt

  yaml-anchors:
    - &debian-setup-script >-
        rm /etc/apt/apt.conf.d/docker-clean
        && apt-get update
        && apt-get install --yes <the-packages>

pipelines:

  default:
    - step:
        caches:
          - apt-lists
          - apt-cache
        script:
          - *debian-setup-script
          - do your thing

Long story:

  1. By default, apt install would autoclean downloaded .deb files unless the installation failed. Avoid it, use apt-get install instead. https://askubuntu.com/a/794987

  2. Most debian-like docker images you will encounter will probably stem from https://github.com/debuerreotype/debuerreotype . They have sensible optimizations to reduce image layer sizes for the bulk people. But our situation in bitbucket-pipelines is totally different: a populated cache is welcome and will be restored in subsequent executions. Remove any stuff in /etc/apt/apt.conf.d/* that might be autocleaning APT's cache, namely /etc/apt/apt.conf.d/docker-clean

  3. apt-get update will litter the /var/lib/apt/lists folder. Keep its contents! Future update instructions will reach the remote repositories anyway (and they should) but there will be no download if your lists are fresh.

  4. /var/cache/apt is the well-known APT cache folder. Keep it!


Trying to cache final installed files is useless and close to nonsense. Generally, the binary placed by a package will rely on a bunch of libraries and files of the same and other packages that will have been spread on the OS folder tree. Also, those files being present will not speed-up update nor install instructions in any way.

This means you are still bound to actually install the packages and run any post-installation script, potentially even having to build some sources. This is equivalent to how a npm or pip cache would work and is totally fine.


If you only want to ssh a remote machine, you should consider the answer by @Rashi https://stackoverflow.com/a/63276721/11715259 instead.

0
Rashi On

I am using a similar configuration, but in my case I want to cache the gettext package, came here for the same reason (to find how to cache gettext).

if you don't have that dependency, you can use the bitbucket provided ssh pipe pipe: atlassian/ssh-run. don't have to create custom docker image.

image: atlassian/default-image:2

pipelines:
  branches:
    develop:
      - step:
          deployment: staging
          script:
              - apt update && apt install -y gettext
              - envsubst < scripts/deploy.sh > deploy-out.sh
              - pipe: atlassian/ssh-run:0.2.6
                variables:
                  SSH_USER: $STAGE_USER
                  SERVER: $STAGE_SERVER
                  COMMAND: 'deploy-out.sh'
                  MODE: 'script'