Here's my Dockerfile:
# v2024.3.4
# =============================
FROM --platform=linux/amd64 mambaorg/micromamba:1.5.6
SHELL ["/usr/local/bin/_dockerfile_shell.sh"]
WORKDIR /tmp/
# Data
USER root
RUN mkdir -p /volumes/
RUN mkdir -p /volumes/input
RUN mkdir -p /volumes/output
RUN mkdir -p /volumes/database
ENV LC_ALL C.UTF-8
ENV LANG C.UTF-8
ENV MPLBACKEND agg
ENV XDG_CONFIG_HOME /home/qiime2
# Retrieve repository
USER $MAMBA_USER
RUN micromamba install -y -n base -c conda-forge wget ca-certificates
ARG MAMBA_DOCKERFILE_ACTIVATE=1
RUN wget --no-check-certificate https://data.qiime2.org/distro/amplicon/qiime2-amplicon-2024.2-py38-linux-conda.yml
# Install dependencies
RUN micromamba install -y -n base \
-c https://packages.qiime2.org/qiime2/2024.2/amplicon/released \
-c bioconda \
-c conda-forge \
-c defaults \
-f /tmp/qiime2-amplicon-2024.2-py38-linux-conda.yml && \
micromamba clean -a -y -f
RUN rm -rf /tmp/qiime2-amplicon-2024.2-py38-linux-conda.yml
# RUN qiime dev refresh-cache
ENTRYPOINT ["/usr/local/bin/_entrypoint.sh"]
Here's my job definition on AWS:
{
"jobDefinitionName": "qiime2-classify-vsearch__16S-rRNA_JB021824-gtdb_ssu_all_r207",
"type": "container",
"containerProperties": {
"image": "jolespin/qiime2-amplicon:2024.2",
"command": [
"mkdir -p",
"/volumes/output/taxonomic_classification/16S-rRNA_JB021824/vsearch/gtdb_ssu_all_r207/",
"&&",
"qiime",
"feature-classifier",
"classify-consensus-vsearch",
"--i-query",
"/volumes/input/16S-rRNA_JB021824/seqs.qza",
"--i-reference-reads",
"/volumes/database/gtdb_ssu_all_r207/seqs.qza",
"--i-reference-taxonomy",
"/volumes/database/gtdb_ssu_all_r207/tax.qza",
"--p-threads",
"16",
"--o-classification",
"/volumes/output/taxonomic_classification/16S-rRNA_JB021824/vsearch/gtdb_ssu_all_r207/classification.qza",
"--o-search-results",
"/volumes/output/taxonomic_classification/16S-rRNA_JB021824/vsearch/gtdb_ssu_all_r207/search-results.qza",
"--verbose"
],
"jobRoleArn": "arn:aws:iam::[redacted_identifier]:role/ecsTaskExecutionRole",
"executionRoleArn": "arn:aws:iam::[redacted_identifier]:role/ecsTaskExecutionRole",
"volumes": [
{
"name": "efs-volume-database",
"efsVolumeConfiguration": {
"fileSystemId": "fs-[redacted_identifier]",
"transitEncryption": "ENABLED",
"rootDirectory": "databases/qiime2/"
}
},
{
"name": "efs-volume-input",
"efsVolumeConfiguration": {
"fileSystemId": "fs-[redacted_identifier]",
"transitEncryption": "ENABLED",
"rootDirectory": "projects/Amplicon/Data/"
}
},
{
"name": "efs-volume-output",
"efsVolumeConfiguration": {
"fileSystemId": "fs-[redacted_identifier]",
"transitEncryption": "ENABLED",
"rootDirectory": "projects/Amplicon/Analysis/"
}
}
],
"mountPoints": [
{
"sourceVolume": "efs-volume-database",
"containerPath": "/volumes/database",
"readOnly": true
},
{
"sourceVolume": "efs-volume-input",
"containerPath": "/volumes/input",
"readOnly": true
},
{
"sourceVolume": "efs-volume-output",
"containerPath": "/volumes/output",
"readOnly": false
}
],
"environment": [],
"ulimits": [],
"resourceRequirements": [
{
"value": "16.0",
"type": "VCPU"
},
{
"value": "65536",
"type": "MEMORY"
}
],
"networkConfiguration": {
"assignPublicIp": "ENABLED"
},
"fargatePlatformConfiguration": {
"platformVersion": "LATEST"
},
"ephemeralStorage": {
"sizeInGiB": 40
}
},
"tags": {
"Name": "qiime2-classify-vsearch__16S-rRNA_JB021824-gtdb_ssu_all_r207"
},
"platformCapabilities": [
"FARGATE"
]
}
When I ran the job, I got the following error:
/usr/local/bin/_entrypoint.sh: line 24: exec: mkdir -p: not found
When I run the container locally, it finds mkdir just fine:
docker run --name test --rm -it jolespin/qiime2-amplicon:2024.2 bash
(base) mambauser@a51e78ef660d:/tmp$ which mkdir
/usr/bin/mkdir
How can I get my Docker container to work as expected with AWS Batch?
There are two bugs caused by one issue, basically. The "command" presented to the job definition should be an array that can be passed to Popen directly, meaning that it should be an executable that can be run, followed by a series of command options.
That leads the the first bug,
"mkdir -p"
should be split up into"mkdir", "-p"
. That will allow the first command to run, since "mkdir" is an executable, but further on you have"&&"
, which is shell syntax to run multiple commands. Since you're really trying to run a shell command, you can be explicit about it, and run a command like:This will launch bash to parse the entire string, and run each command in turn. Note that you're now passing all of the options as one string, so you'll need to quote any options that need a space, like I've done with "Example String" here.