If a job you’ve submitted fails, you’ll get an email notification with some details about the type of error encountered. Below are some common error scenarios and guidance on how to deal with them.
Jump to:
General
Not enough quota
Issue type
Quota related
How to detect this problem
A user’s submission fails with a specific error message around quota being exceeded: "User quota exceeded for pipeline [PIPELINE_NAME]."
Problem explanation
Each pipeline has a quota limit that only allows a certain amount of usage of the system. If a user is running a submission that would cause them to exceed their remaining quota, the service will fail that submission. For more information, see the documentation on Quota.
Resolution
The error message returned to the user will give guidance on how a user can increase their quota limits so they can continue submitting to the service if approved.
Array Imputation
Input array VCFs take too long to upload
Issue type
User inputs
How to detect this problem
Your upload takes too long or fails.
Problem explanation
Some VCFs contain a lot of extra information (annotations) that increases the file size. Imputation only needs the GT annotation. Removing extraneous annotations can dramatically reduce input VCF file size and speed up the upload process.
Resolution
Remove all annotations except GT FORMAT field, e.g., using the following bcftools command:
bcftools annotate ~{vcf} -x INFO,^FORMAT/GT -Oz -o ~{basename}.vcf.gzUser input failed QC
Issue type
User inputs
How to detect this problem
Your job fails with the message "User input failed QC" and a brief description of the issue(s) detected with the user input. The following problems may be reported:
Greater than 3 million variants found in the input VCF
Inputs containing more than 3 million variants are almost certainly not array data, and our pipeline does not handle them.
VCF version < 4.0 or not found
The input VCF should have a header line like `##fileformat=VCFv4.2`, and the version must be at least 4.0. If your VCF is missing this header line, you can add it using bcftools reheader; if your VCF is a lower version, you will need to drop INDEL sites (which you can do by running `bcftools view -V indels`), and update the header. Note that the service drops INDEL sites from the user input before phasing and imputing. (See more about the pipeline here[link to Pipeline Overview page].)
No variant data found for any chromosome in the supported contigs
The service currently imputes over the hg38 autosome, i.e. chr1-chr22. Your input VCF must contain variants from at least one of these contigs. If your data has contigs that look like (1, 2, 3) rather than (chr1, chr2, chr3), it may not have been mapped properly to the hg38 (GRCh38) reference. One tool available to liftover a VCF that has been mapped to a different reference is picard LiftoverVcf. Note that you will need a chain file describing the differences between references to do the liftover; the chain file you need will depend on the current mapping of your VCF. Learn more about chain files here and download common chain files here.
Input VCF is not sorted
The service requires a sorted VCF. You can sort your VCF using bcftools sort: bcftools sort input_file.vcf.gz -Oz -o input_file_sorted.vcf.gz
Input VCF is not BGZF compressed
The imputation service uses tools that require BGZF compression. Many genomics tools use bgzip compression, but if yours does not or your data is not compressed, first ensure your file is decompressed, and then compress it using bgzip (available from samtools).
VCF header contains none of the expected contigs
The imputation service requires the input VCF to have been mapped to the hg38 reference, and there must be header lines denoting each contig and the correct contig length. If you encounter this error, either your input VCF header contained no hg38 contigs or their lengths did not match the hg38 reference. You can diagnose the problem by running the gatk ValidateVariants command with the following options: gatk ValidateVariants -V path/to/input.vcf.gz --sequence-dictionary Homo_sapiens_assembly38.dict --validation-type-to-exclude ALL
You can download the required sequence dictionary (Homo_sapiens_assembly38.dict file) here.
Input VCF contains improperly coded indels
The imputation service requires the input VCF to not contain indels coded as REF or ALT “D/I”. You can fix this by using bcftools view: bcftools view -e 'REF="I,D" || ALT="I,D"' INPUT.vcf.gz -Oz -o INPUT_FIXED.vcf.gz
Input VCF contains contig headers with missing length attributes
The imputation service requires the input VCF with contig headers to have length attributes otherwise the service will return an error message. This can be fixed using UpdateVCFSequenceDictionary tool by running the following command:
gatk UpdateVCFSequenceDictionary \
-V input.vcf.gz \
--sequence-dictionary local_path_to_reference.dict \
--output output_with_lengths.vcf.gzYou can use the reference dictionary available at gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dict (a Google-owned public bucket). Download this file to a local path and provide that path to the command above.
For more information about input VCF requirements for the array imputation service, see Input VCF requirements.
Low Pass WGS Imputation
Coming soon!
Comments
0 comments
Please sign in to leave a comment.