base recalibrator --known-sites argument does not work

255 views Asked by At

When I run this

gatk BaseRecalibrator -I ${aligned_reads}/file_copy_sorted.bam -R ${referance} --known-sites ${Known_Sites} -O ${data1}/sorted_Read1.table

this error occur

A USER ERROR has occurred: Argument known-sites was missing: Argument 'known-sites' is required

this is what I tried

Using GATK jar /home/hi/Documents/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/hi/Documents/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar BaseRecalibrator -I /file_copy_sorted.bam -R --known-sites -O /sorted_Read1.table USAGE: BaseRecalibrator [arguments]

First pass of the Base Quality Score Recalibration (BQSR) -- Generates recalibration table based on various user-specified covariates (such as read group, reported quality score, machine cycle, and nucleotide context). Version:4.4.0.0

Required Arguments:

--input,-I BAM/SAM/CRAM file containing reads This argument must be specified at least once. Required.

--known-sites One or more databases of known polymorphic sites used to exclude regions around known polymorphisms from analysis. This argument must be specified at least once. Required.

--output,-O The output recalibration table file to create Required.

--reference,-R Reference sequence file Required.

Optional Arguments:

--add-output-sam-program-record If true, adds a PG tag to created SAM/BAM/CRAM files. Default value: true. Possible values: {true, false}

--add-output-vcf-command-line If true, adds a command line header line to created VCF files. Default value: true. Possible values: {true, false}

--arguments_file read one or more arguments files and add them to the command line This argument may be specified 0 or more times. Default value: null.

--binary-tag-name the binary tag covariate name if using it Default value: null.

--bqsr-baq-gap-open-penalty BQSR BAQ gap open penalty (Phred Scaled). Default value is 40. 30 is perhaps better for whole genome call sets Default value: 40.0.

--cloud-index-prefetch-buffer,-CIPB Size of the cloud-only prefetch buffer (in MB; 0 to disable). Defaults to cloudPrefetchBuffer if unset. Default value: -1.

--cloud-prefetch-buffer,-CPB Size of the cloud-only prefetch buffer (in MB; 0 to disable). Default value: 40.

--create-output-bam-index,-OBI If true, create a BAM/CRAM index when writing a coordinate-sorted BAM/CRAM file. Default value: true. Possible values: {true, false}

--create-output-bam-md5,-OBM If true, create a MD5 digest for any BAM/SAM/CRAM file created Default value: false. Possible values: {true, false}

--create-output-variant-index,-OVI If true, create a VCF index when writing a coordinate-sorted VCF file. Default value: true. Possible values: {true, false}

--create-output-variant-md5,-OVM If true, create a a MD5 digest any VCF file created. Default value: false. Possible values: {true, false}

--default-base-qualities Assign a default base quality Default value: -1.

--deletions-default-quality default quality for the base deletions covariate Default value: 45.

--disable-bam-index-caching,-DBIC If true, don't cache bam indexes, this will reduce memory requirements but may harm performance if many intervals are specified. Caching is automatically disabled if there are no intervals specified. Default value: false. Possible values: {true, false}

--disable-read-filter,-DF Read filters to be disabled before analysis This argument may be specified 0 or more times. Default value: null. Possible values: {MappedReadFilter, MappingQualityAvailableReadFilter, MappingQualityNotZeroReadFilter, NotDuplicateReadFilter, NotSecondaryAlignmentReadFilter, PassesVendorQualityCheckReadFilter, WellformedReadFilter}

--disable-sequence-dictionary-validation If specified, do not check the sequence dictionaries from our inputs for compatibility. Use at your own risk! Default value: false. Possible values: {true, false}

--exclude-intervals,-XL One or more genomic intervals to exclude from processing This argument may be specified 0 or more times. Default value: null.

--gatk-config-file A configuration file to use with the GATK. Default value: null.

--gcs-max-retries,-gcs-retries If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection Default value: 20.

--gcs-project-for-requester-pays Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed. User must have storage.buckets.get permission on the bucket being accessed. Default value: .

--help,-h display the help message Default value: false. Possible values: {true, false}

--indels-context-size,-ics Size of the k-mer context to be used for base insertions and deletions Default value: 3.

--insertions-default-quality default quality for the base insertions covariate Default value: 45.

--interval-exclusion-padding,-ixp Amount of padding (in bp) to add to each interval you are excluding. Default value: 0.

--interval-merging-rule,-imr Interval merging rule for abutting intervals Default value: ALL. Possible values: {ALL, OVERLAPPING_ONLY}

--interval-padding,-ip Amount of padding (in bp) to add to each interval you are including. Default value: 0.

--interval-set-rule,-isr Set merging approach to use for combining interval inputs Default value: UNION. Possible values: {UNION, INTERSECTION}

--intervals,-L One or more genomic intervals over which to operate This argument may be specified 0 or more times. Default value: null.

--lenient,-LE Lenient processing of VCF files Default value: false. Possible values: {true, false}

--low-quality-tail minimum quality for the bases in the tail of the reads to be considered Default value: 2.

--max-variants-per-shard If non-zero, partitions VCF output into shards, each containing up to the given number of records. Default value: 0.

--maximum-cycle-value,-max-cycle The maximum cycle value permitted for the Cycle covariate Default value: 500.

--mismatches-context-size,-mcs Size of the k-mer context to be used for base mismatches Default value: 2.

--mismatches-default-quality default quality for the base mismatches covariate Default value: -1.

--preserve-qscores-less-than Don't recalibrate bases with quality scores less than this threshold (with -bqsr) Default value: 6.

--quantizing-levels number of distinct quality scores in the quantized output Default value: 16.

--QUIET Whether to suppress job-summary info on System.err. Default value: false. Possible values: {true, false}

--read-filter,-RF Read filters to be applied before analysis This argument may be specified 0 or more times. Default value: null. Possible values: {AlignmentAgreesWithHeaderReadFilter, AllowAllReadsReadFilter, AmbiguousBaseReadFilter, CigarContainsNoNOperator, ExcessiveEndClippedReadFilter, FirstOfPairReadFilter, FlowBasedTPAttributeSymetricReadFilter, FlowBasedTPAttributeValidReadFilter, FragmentLengthReadFilter, GoodCigarReadFilter, HasReadGroupReadFilter, HmerQualitySymetricReadFilter, IntervalOverlapReadFilter, JexlExpressionReadTagValueFilter, LibraryReadFilter, MappedReadFilter, MappingQualityAvailableReadFilter, MappingQualityNotZeroReadFilter, MappingQualityReadFilter, MatchingBasesAndQualsReadFilter, MateDifferentStrandReadFilter, MateDistantReadFilter, MateOnSameContigOrNoMappedMateReadFilter, MateUnmappedAndUnmappedReadFilter, MetricsReadFilter, NonChimericOriginalAlignmentReadFilter, NonZeroFragmentLengthReadFilter, NonZeroReferenceLengthAlignmentReadFilter, NotDuplicateReadFilter, NotOpticalDuplicateReadFilter, NotProperlyPairedReadFilter, NotSecondaryAlignmentReadFilter, NotSupplementaryAlignmentReadFilter, OverclippedReadFilter, PairedReadFilter, PassesVendorQualityCheckReadFilter, PlatformReadFilter, PlatformUnitReadFilter, PrimaryLineReadFilter, ProperlyPairedReadFilter, ReadGroupBlackListReadFilter, ReadGroupHasFlowOrderReadFilter, ReadGroupReadFilter, ReadLengthEqualsCigarLengthReadFilter, ReadLengthReadFilter, ReadNameReadFilter, ReadStrandFilter, ReadTagValueFilter, SampleReadFilter, SecondOfPairReadFilter, SeqIsStoredReadFilter, SoftClippedReadFilter, ValidAlignmentEndReadFilter, ValidAlignmentStartReadFilter, WellformedFlowBasedReadFilter, WellformedReadFilter}

--read-index Indices to use for the read inputs. If specified, an index must be provided for every read input and in the same order as the read inputs. If this argument is not specified, the path to the index for each input will be inferred automatically. This argument may be specified 0 or more times. Default value: null.

--read-validation-stringency,-VS Validation stringency for all SAM/BAM/CRAM/SRA files read by this program. The default stringency value SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default value: SILENT. Possible values: {STRICT, LENIENT, SILENT}

--seconds-between-progress-updates Output traversal statistics every time this many seconds elapse Default value: 10.0.

--sequence-dictionary Use the given sequence dictionary as the master/canonical sequence dictionary. Must be a .dict file. Default value: null.

--sites-only-vcf-output If true, don't emit genotype fields when writing vcf file output. Default value: false. Possible values: {true, false}

--tmp-dir Temp directory to use. Default value: null.

--use-jdk-deflater,-jdk-deflater Whether to use the JdkDeflater (as opposed to IntelDeflater) Default value: false. Possible values: {true, false}

--use-jdk-inflater,-jdk-inflater Whether to use the JdkInflater (as opposed to IntelInflater) Default value: false. Possible values: {true, false}

--use-original-qualities,-OQ Use the base quality scores from the OQ tag Default value: false. Possible values: {true, false}

--verbosity Control verbosity of logging. Default value: INFO. Possible values: {ERROR, WARNING, INFO, DEBUG}

--version display the version number for this tool Default value: false. Possible values: {true, false}

Advanced Arguments:

--disable-tool-default-read-filters Disable all tool default read filters (WARNING: many tools will not function correctly without their default read filters on) Default value: false. Possible values: {true, false}

--showHidden display hidden arguments Default value: false. Possible values: {true, false}


A USER ERROR has occurred: Argument known-sites was missing: Argument 'known-sites' is required


Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

1

There are 1 answers

6
heiha On BEST ANSWER

Please pay attention to this

Using GATK jar /home/hi/Documents/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/hi/Documents/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar BaseRecalibrator -I /file_copy_sorted.bam -R --known-sites -O /sorted_Read1.table USAGE: BaseRecalibrator [arguments]

that is how GATK explains your command, as Stephen C said, ${Known_Sites}, which is a shell variable, may not be set before your running this command.

For hunman, your can provide dbSNP like dbsnp_138.hg38.vcf.gz and Mills_and_1000G_gold_standard.indels.hg38.vcf etc to --known-sites argument. If you can not find a known site for your data, please follow the instructions here.