Why is this GitLab post-receive script terminated or stuck in a while loop?

386 views Asked by At

After writing a post-receive script that looks for the latest job.log file that is created by the GitLab Runner CI, I'm noticing that the post-receive script gets terminated or stuck before it finds the latest job.log file. In particular it does not move beyond a while loop. Additionally, the GitLab Runner gives a 4:Deadline Exceeded error.

MWE

The MWE does the complete deployment, and uploads a repository to the GitLab server and runs the CI on the repository. However, it is not yet generalised well, hence it has (at least) the following requirements: System: Ubuntu 20.04, Architecture:AMD64.

git clone [email protected]:Deployment-Oneliners/Self-host-GitLab-Server-and-Runner-CI.git
cd Self-host-GitLab-Server-and-Runner-CI
git checkout post-receive
rm -r test/libs/*
chmod +x install-bats-libs.sh
./install-bats-libs.sh
./install_gitlab.sh -s -r
./test.sh

Then one can inspect the log of the post-receive script inside the GitLab docker with:

sudo docker ps -a
sudo docker exec -t -i ab15330e020f  /bin/bash
cd /var/opt/gitlab/git-data/repositories/@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.git/refs/keep-around/9514d16aafc1d741ba6a9ff47718d632fa8d435b
cat post_receive_log.txt

To uninstall the MWE completely, one can run: ./uninstall_gitlab.sh -y -h -r.

Relevant code

To identify where the code stops, I made the post-receive script export a lot of variables to the post-receive-log.txt. Here is the loop that searches for the most recent job log:

find_job_of_commit() {
    local search_path=$1
    local searched_commit=$2
    echo "in loop search_path=$search_path" >> "post_receive_log.txt"
    echo "in loop searched_commit=$searched_commit" >> "post_receive_log.txt"
    query_result=$(while ! find "$search_path" -name "job.log" | xargs grep "Checking out $searched_commit"; do sleep 10 ; done)
    echo "query_result=$query_result" >> "post_receive_log.txt"
}

Output:

This outputs the following post_receive_log.txt:

repopath_to_artifacts=/var/opt/gitlab/gitlab-rails/shared/artifacts/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35
in loop search_path=/var/opt/gitlab/gitlab-rails/shared/artifacts/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35
in loop searched_commit=eb052e7d

So basically one can conclude the post-receive script is terminated during the sleep 10 command, or that it is stuck in the while loop without being able to find the file. A more elaborate code revealed that it actually stopped after a sleep 10 command. And it does not find the last job (the latest job nr was 31 in this run).

However, based on the post_receive_log.txt output, I can run the exact waiting command manually inside the GitLab docker, and there it does work:

root@127:/var/opt/gitlab/git-data/repositories/@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.git# while ! find "/var/opt/gitlab/gitlab-rails/shared/artifacts/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35" -name "job.log" | xargs grep "Checking out eb052e7d"; do sleep 10 ; done
/var/opt/gitlab/gitlab-rails/shared/artifacts/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35/2021_10_16/33/33/job.log:Checking out eb052e7d as master...

Hypothesis I

I think perhaps the @ symbol in the filepath of repopath_to_artifacts is proccessed differently in the command line than inside the bash script, leading to an invalid/nonexistant path in bash, but a valid path in the CLI.

Hypothesis II

So my second thought is that the post-receive is terminated after a certain amount of seconds by GitLab. This migt be substantiated by the 4:Deadline Exceeded message: enter image description here

Hypothesis III

The find command uses some kind of image of the directories that is not updated within a single shell script. (Seems unlikely to me and does not explain why the post-receive script would stop). However, it is substantiated by a manual test, if I run the post-receive script manually (within the Docker) with:

/var/opt/gitlab/git-data/repositories/@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.git
/opt/gitlab/embedded/service/gitlab-shell/hooks/post-receive.d/./post-receive
f818b5eabfed71a70923bbf5186e31fc0806b6bc\n f818b5eabfed71a70923bbf5186e31fc0806b6bc\n repo_to_test_runner

Which works well for both failed and successfull jobs. Even directly after the post-receive job is terminated unsuccesfully, on the commit it previously couldn't find.

Question

Why is the post-receive file terminated unexpectedly or stuck in the while loop?

1

There are 1 answers

0
a.t. On

Hypothesis II is confirmed. The post-receive script is terminated after roughly 60 seconds. I do not yet know why. I determined this by logging an output every 2 seconds, looking at the log file while it is running and checking when it stops producing outputs. It stopped as soon as the 4:Deadline Exceeded. message popped up, even though the debugging for loop that produced the output, still had 96 iterations to go.