After writing a post-receive script that looks for the latest job.log file that is created by the GitLab Runner CI, I'm noticing that the post-receive script gets terminated or stuck before it finds the latest job.log file. In particular it does not move beyond a while loop. Additionally, the GitLab Runner gives a 4:Deadline Exceeded error.
MWE
The MWE does the complete deployment, and uploads a repository to the GitLab server and runs the CI on the repository. However, it is not yet generalised well, hence it has (at least) the following requirements: System: Ubuntu 20.04, Architecture:AMD64.
git clone [email protected]:Deployment-Oneliners/Self-host-GitLab-Server-and-Runner-CI.git
cd Self-host-GitLab-Server-and-Runner-CI
git checkout post-receive
rm -r test/libs/*
chmod +x install-bats-libs.sh
./install-bats-libs.sh
./install_gitlab.sh -s -r
./test.sh
Then one can inspect the log of the post-receive script inside the GitLab docker with:
sudo docker ps -a
sudo docker exec -t -i ab15330e020f /bin/bash
cd /var/opt/gitlab/git-data/repositories/@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.git/refs/keep-around/9514d16aafc1d741ba6a9ff47718d632fa8d435b
cat post_receive_log.txt
To uninstall the MWE completely, one can run: ./uninstall_gitlab.sh -y -h -r.
Relevant code
To identify where the code stops, I made the post-receive script export a lot of variables to the post-receive-log.txt. Here is the loop that searches for the most recent job log:
find_job_of_commit() {
local search_path=$1
local searched_commit=$2
echo "in loop search_path=$search_path" >> "post_receive_log.txt"
echo "in loop searched_commit=$searched_commit" >> "post_receive_log.txt"
query_result=$(while ! find "$search_path" -name "job.log" | xargs grep "Checking out $searched_commit"; do sleep 10 ; done)
echo "query_result=$query_result" >> "post_receive_log.txt"
}
Output:
This outputs the following post_receive_log.txt:
repopath_to_artifacts=/var/opt/gitlab/gitlab-rails/shared/artifacts/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35
in loop search_path=/var/opt/gitlab/gitlab-rails/shared/artifacts/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35
in loop searched_commit=eb052e7d
So basically one can conclude the post-receive script is terminated during the sleep 10 command, or that it is stuck in the while loop without being able to find the file. A more elaborate code revealed that it actually stopped after a sleep 10 command. And it does not find the last job (the latest job nr was 31 in this run).
However, based on the post_receive_log.txt output, I can run the exact waiting command manually inside the GitLab docker, and there it does work:
root@127:/var/opt/gitlab/git-data/repositories/@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.git# while ! find "/var/opt/gitlab/gitlab-rails/shared/artifacts/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35" -name "job.log" | xargs grep "Checking out eb052e7d"; do sleep 10 ; done
/var/opt/gitlab/gitlab-rails/shared/artifacts/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35/2021_10_16/33/33/job.log:Checking out eb052e7d as master...
Hypothesis I
I think perhaps the @ symbol in the filepath of repopath_to_artifacts is proccessed differently in the command line than inside the bash script, leading to an invalid/nonexistant path in bash, but a valid path in the CLI.
Hypothesis II
So my second thought is that the post-receive is terminated after a certain amount of seconds by GitLab. This migt be substantiated by the 4:Deadline Exceeded message:

Hypothesis III
The find command uses some kind of image of the directories that is not updated within a single shell script. (Seems unlikely to me and does not explain why the post-receive script would stop). However, it is substantiated by a manual test, if I run the post-receive script manually (within the Docker) with:
/var/opt/gitlab/git-data/repositories/@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.git
/opt/gitlab/embedded/service/gitlab-shell/hooks/post-receive.d/./post-receive
f818b5eabfed71a70923bbf5186e31fc0806b6bc\n f818b5eabfed71a70923bbf5186e31fc0806b6bc\n repo_to_test_runner
Which works well for both failed and successfull jobs. Even directly after the post-receive job is terminated unsuccesfully, on the commit it previously couldn't find.
Question
Why is the post-receive file terminated unexpectedly or stuck in the while loop?
Hypothesis II is confirmed. The post-receive script is terminated after roughly 60 seconds. I do not yet know why. I determined this by logging an output every 2 seconds, looking at the log file while it is running and checking when it stops producing outputs. It stopped as soon as the
4:Deadline Exceeded.message popped up, even though the debugging for loop that produced the output, still had 96 iterations to go.