googlegenomics: dsub error with pipelines-api-examples/fastqc example

65 views Asked by At

I was trying to follow the fastqc example given in googlegenomics/pipelines-api-examples. However, when I try to re-produce the example with my projectID and bucketID, I get an error:

| => dsub --project my_project_ID --logging "gs://my_data/test/logging" --disk-size 200 --name "fastqc" --image "us.gcr.io/my_project_ID/fastqc" --output OUTPUT_FILES="gs://my_data/test/out“ --input INPUT_BAM="gs://genomics-public-data/ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/pilot3_exon_targetted_GRCh37_bams/data/NA06986/alignment/NA06986.chrom19.ILLUMINA.bwa.CEU.exon_targetted.20100311.bam" --command 'fastqc ${INPUT_BAM} --outdir=$(dirname ${OUTPUT_FILES})' --wait --zones "us-central1-*"

usage: dirname path
Traceback (most recent call last):
  File "/usr/local/bin/dsub", line 9, in 
    load_entry_point('dsub==0.0.0', 'console_scripts', 'dsub')()
  File "build/bdist.macosx-10.11-intel/egg/dsub/commands/dsub.py", line 646, in main
  File "build/bdist.macosx-10.11-intel/egg/dsub/commands/dsub.py", line 636, in dsub_main
  File "build/bdist.macosx-10.11-intel/egg/dsub/commands/dsub.py", line 700, in run_main
  File "build/bdist.macosx-10.11-intel/egg/dsub/lib/param_util.py", line 736, in args_to_job_data
  File "build/bdist.macosx-10.11-intel/egg/dsub/lib/param_util.py", line 414, in make_param
  File "build/bdist.macosx-10.11-intel/egg/dsub/lib/param_util.py", line 460, in parse_uri
  File "build/bdist.macosx-10.11-intel/egg/dsub/lib/param_util.py", line 397, in _validate_paths_or_fail

ValueError: Path wildcard (*) are only supported for files: gs://my_data/test/out“ --input INPUT_BAM=gs://genomics-public-data/ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/pilot3_exon_targetted_GRCh37_bams/data/NA06986/alignment/NA06986.chrom19.ILLUMINA.bwa.CEU.exon_targetted.20100311.bam --command 'fastqc  --outdir=' --wait --zones us-central1-*
dsub --project my_project_ID --logging gs://my_data/test/logging --disk-size 200 --name fastqc --image us.gcr.io/my_project_ID/fastqc --output OUTPUT_FILES=gs://my_data/test/out“

Where am I going wrong?

1

There are 1 answers

0
Dan Cornilescu On

Converting the comment to an answer.

The problem seem to be with --zones "us-central1-*" - the only other argument using a * wilcard in the command you used.

Using a specific zone like --zones "us-central1-a" or --zones "us-central1" (see the Zones diagram describing zones available in each region) or just dropping that argument altogether should help.