I'm using Azure batch python API. When I'm creating a new job, I see exit code 128 (image attached). How can I know what is the reason for that?
I'm creating a new job using this code :
def wrap_commands_in_shell(commands):
return "/bin/bash -c 'set -e; set -o pipefail; {}; wait'".format(';'.join(commands))
job_tasks = ['cd /mnt/batch/tasks/shared/ && git clone https://github.com/cryptobiu/OSPSI.git',
'cd /mnt/batch/tasks/shared/OSPSI && git checkout cloud',
'cd /mnt/batch/tasks/shared/OSPSI && cmake CMake',
'cd /mnt/batch/tasks/shared/OSPSI && mkdir -p assets'
]
job_creation_information = batch.models.JobAddParameter(job_id, batch.models.PoolInformation(pool_id=pool_id),
job_preparation_task=batch.models.JobPreparationTask(
command_line=wrap_commands_in_shell(
job_tasks),
run_elevated=True,
wait_for_success=True
)
)
To diagnose, you can look at the
stderr.txt
andstdout.txt
for the Job Preparation task that has failed in the Azure Portal, using Azure Batch Explorer, or using an SDK via code. If you look at which node ran the job prep task, navigate to that node, then the job directory. Under the job directory, you should see ajobpreparation
directory. In that directory will have thestderr.txt
andstdout.txt
.With regard to the exit code, there are a few potential problems that could cause this:
git
,cmake
and any other dependencies as part of a start task?A few notes about your
job_tasks
array:/mnt/batch/tasks/shared
. This path to the "shared" directory may not be the same between Linux distributions. You should use the environment variable$AZ_BATCH_NODE_SHARED_DIR
instead. You can view a full list of Azure Batch pre-filled environment variables here.job_tasks
as:['cd $AZ_BATCH_NODE_SHARED_DIR', 'TODO: INSERT YOUR COMMANDS TO SETUP AUTH WITH GITHUB FOR PRIVATE REPO', 'git clone https://github.com/cryptobiu/OSPSI.git', 'cd OSPSI', 'cmake CMake', 'mkdir -p assets']