Git Sparse checkout during Azure pipelines taking ~15 minutes

1.9k views Asked by At

We have a single repo of our source code which if downloaded is around 2.8GB. We have 4 self hosted agents and over 100 build pipelines. With that, it is not feasible to download the entire source coded for each build/agent.

The approach I gone with is to disable the checkout for these pipeline and then run a command-line script to perform a Git sparse checkout. However this is taking around 15 minutes to get ~100MB worth of source code.

We are using self-hosted Linux agents.

        steps:
          - checkout: none
          - task: CmdLine@2
            displayName: "Project Specific Checkout"
            inputs:
              script: |
                cd $(Build.SourcesDirectory)
                git init

                git config --global user.email ""
                git config --global user.name ""
                git config --global core.sparsecheckout true

                echo STARS/Source/A/ >> .git/info/sparse-checkout
                echo STARS/Source/B/ >> .git/info/sparse-checkout
                echo STARS/Source/C/ >> .git/info/sparse-checkout

                git remote rm origin
                git remote add origin https://service:$(Service.Account.Personal.Access.Token)@dev.azure.com/Organization/Project/_git/STARS
                git reset --hard
                git pull origin $(Build.SourceBranch)

Is there anything I'm doing wrong here which is causing it to take so long to pull this data.

1

There are 1 answers

2
Cece Dong - MSFT On

1.Since you use self-hosted agent, you could go to the agent machine, to run the git commands manually, to see whether you would get the same performance.

2.Set variable system.debug to true, to check which command cost more time.

3.Instead of Git Sparse checkout, you may specify path in checkout step:

steps:
- checkout: self | none | repository name # self represents the repo where the initial Pipelines YAML file was found
  clean: boolean  # if true, run `execute git clean -ffdx && git reset --hard HEAD` before fetching
  fetchDepth: number  # the depth of commits to ask Git to fetch; defaults to no limit
  lfs: boolean  # whether to download Git-LFS files; defaults to false
  submodules: true | recursive  # set to 'true' for a single level of submodules or 'recursive' to get submodules of submodules; defaults to not checking out submodules
  path: string  # path to check out source code, relative to the agent's build directory (e.g. \_work\1); defaults to a directory called `s`
  persistCredentials: boolean  # if 'true', leave the OAuth token in the Git config after the initial fetch; defaults to false

https://learn.microsoft.com/en-us/azure/devops/pipelines/yaml-schema?view=azure-devops&tabs=schema%2Cparameter-schema#checkout

4.Since you run a pipeline on a self-hosted agent, by default, none of the subdirectories are cleaned in between two consecutive runs. As a result, you can do incremental builds and deployments, provided that tasks are implemented to make use of that. So you can set Clean option to false.

https://learn.microsoft.com/en-us/azure/devops/pipelines/process/phases?view=azure-devops&tabs=yaml#workspace