"File does not exist" error when running ensembl-vep with docker

201 views Asked by At

I installed ensembl-vep as instructed in the documentation:

cd $HOME/vep_data
curl -O https://ftp.ensembl.org/pub/release-110/variation/vep/homo_sapiens_vep_110_GRCh38.tar.gz
tar xzf homo_sapiens_vep_110_GRCh38.tar.gz

When I tied running the program as follows, the following error message was printed, even though the file does exist:

$ sudo docker run -v $HOME/vep_data:/data ensemblorg/ensembl-vep   vep --cache --offline --format vcf --vcf --force_overwrite     
  --input_file input/1000123_23191_0_0.g.vcf       --output_file output/my_output.vcf
-------------------- EXCEPTION --------------------
MSG: ERROR: File "input/1000123_23191_0_0.g.vcf" does not exist

STACK Bio::EnsEMBL::VEP::Parser::file /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Parser.pm:237
STACK Bio::EnsEMBL::VEP::Parser::new /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Parser.pm:131
STACK Bio::EnsEMBL::VEP::Runner::get_Parser /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:802
STACK Bio::EnsEMBL::VEP::Runner::get_InputBuffer /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:829
STACK Bio::EnsEMBL::VEP::Runner::init /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:136
STACK Bio::EnsEMBL::VEP::Runner::run /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:200
STACK toplevel /opt/vep/src/ensembl-vep/vep:46
Date (localtime)    = Wed Aug  2 15:41:03 2023
Ensembl API version = 110
---------------------------------------------------
1

There are 1 answers

6
Patrick H. On

In your docker container you have a completely different overlay filesystem, and therefor input/1000123_23191_0_0.g.vcf just does not exist there. You need to make every directory or drive available within your container first using volumes(-v) as you did with the downloaded dataset: -v $HOME/vep_data:/data

An option would be to add -v $(pwd)/input:/input to have everything available inside your container.

It's completely fine if you have several -v parameters in you docker run command, just make sure they are placed between run and the docker-image used

Edit: In this specific case the program assumes that you put everything into $HOME/vep_data. As taken from the Dockerfile you can see that the working directory in the container is /data. In the example you are trying to reproduce, its assumed, that there is e.g. a /data/input directory which contains the file to be processed. So you ultimately have to put your files under $HOME/vep_data/input so that the relative paths used in the example are working. Alternative is to use absolute paths and use additional volumes as described above:

-v /path/to/input:/input .... --input_file /input/myfile.vcf

I personally prefer absolute paths when working with these containers, as I don't want to rely on some structure that might be expected (and change over time with newer versions)