-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IllegalStateException in GenotypeGVCFs after GenomicsDBImport - GATK 4.2.6.1 #7938
Comments
Just reiterating here what @lbergelson noted in office hours: looks like the offending check was added in #7738, which ultimately affects both the ExcessHet and InbreedingCoeff annotations. @droazen reviewed that PR and might have more insight as to the desired behavior for these annotations when we are missing PLs due to GenomicsDB dropping them upstream---should we just not emit these annotations? |
@AJDCiarla The user should try re-running |
@AJDCiarla It would also be useful to know whether the error occurs when the user runs |
@droazen, like Karina posted in #7933, with our inputs this issue only occurs when using --force-output-intervals. I tried increasing --max-alternate-alleles to 2048 with no change. I just got back from a vacation, but this week I will try to debug this more closely to see what is causing the issue. Have you had any further discussions beyond what @samuelklee suggested above? |
* Support direct CRAM conversion in alignment pipelines * Remove jboss and standardize gradle files with develop * Switch error to warning * Update GenotypeGVCFHandler to include non-variant sites when making sites-only VCF * Update case in toLower * More informative error message * Fix argument in SamtoolsCramConverter * Update jbrowse dependencies (#165) * Add CRAM to allowable JBrowse track types * Initial support for pbmm2 and pbsv * Initial support for vulcan long read aligner * Support for quality metrics from nimble * Ensure output directory exists * Fix to JBrowse 2 CRAM tracks * Add additional vulcan alignment outputs * Fix filepath typo * Update nimble alignment defaults * Add debug message for nimble * Bugfix to nimble metrics import when running as alignment * Improve column width for nimble panels * Make nimble max_hits_to_report configurable * Skip merge unaligned for long-read aligners * Add UCell calculation step * Better handling for job resume after nimble failure * Add validation and bugfix for Nimble metrics import * Improve log messages for nimble metrics import * Support maxGenotypeCount for GenotypeGVCFs * Bump terser from 5.12.1 to 5.14.2 in /jbrowse (#166) Bumps [terser](https://github.com/terser/terser) from 5.12.1 to 5.14.2. - [Release notes](https://github.com/terser/terser/releases) - [Changelog](https://github.com/terser/terser/blob/master/CHANGELOG.md) - [Commits](https://github.com/terser/terser/commits) --- updated-dependencies: - dependency-name: terser dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bugfix to the order of nimble gz and error checks * Bugfix to the order of nimble gz and error checks * Add debugging * Add debugging to cDNA prep * Add view to assist in management of single-cell data * Add another view to assist in management of single-cell data * Remove no longer needed install of jbrowse 1 * Update report filter * Allow batch assignment of sequence libraries to runs by lane * Add new param to fail 10x processing if too few cells are found * Support nimble strandedness filter * Remove no-longer-needed nimble handler * Support CellMembrane and Seurat IntegrateData * Update SplitSeurat minCellsToKeep to allow fractions * Register new IntegrateData step * Support RIRA CalculateGeneComponentScores * Bugfix to RIRA CalculateGeneComponentScores * Skip GATK annotations to avoid broadinstitute/gatk#7938 * Return to using MS- prefix for cell hashing dual-index barcodes * Add additional single-cell filters and support ReblockGVCF * Allow celltypist model to run across genomes * Fix nextclade syntax * Fix nextclade syntax * Fix nextclade syntax * Fix nextclade syntax * Support GATK ReblockGVCF * Switch queries to use POST * Bugfix to ReblockGvcfHandler * Add admin action to manually update URI on ExpData objects * Allow GenotypeGVCFHandler to create genomicsdb workspaces on-the-fly * Ensure file is cached for GenotypeGVCFs exclude_intervals * More specific regex * Serialize SequenceAnalysisJobSupport outside of PipelineJob to reduce the size of the job's JSON * Allow deserialization of legacy JSON files containing support property * Update artifactory URLs * Improve unit test * Debug pipeline job serialization (#167) * Debug pipeline job serialization * Fix bug with gene scores not being saved * Increase RAM for remote FASTQC jobs * Test fixes * Only serialize SequenceJobSupport to disk when running on webserver * More fixes around serialization of SequenceJobSupport * Reduce HaplotypeCaller max-alternate-alleles * Support additional GenotypeGVCF params * Fix tests * Fix value for createsSeuratObjects on several steps * Improve warning messages * Improve warning messages * Improve warning messages * Improve GenotypeGVCFs logging * Improve GenotypeGVCFs logging and drop old params * Prior to GenotypeGVCFs, create workspaces with padding over the provided intervals * Refactor VcfComparisonStep to support VCF output(s), and add mGAP-release-specific version * Add validation for SamtoolsCramConverter * Reduce the amount of serialization to disk from SequenceJobSupport * Prepare sequence pipeline client code for non-savable params * Allow Seurat merge object name to be excluded from saved templates * Allow GenotypeGVCFs to locally cache support files * Update picard syntax to match upcoming argument changes * Further reduce sequence support serialization * Add creation of bgzipped genomes to standard genome import * Bugfix to genome gzipping * Update picard version for tests * Bugfix to alignment and skipping merge unaligned reads * Match picard version to sequence tests * Correct picard version * Improve logging Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Hello, did you deal with this probelm, I also encounter this. |
The error comes from two annotations: InbreedingCoeff and ExcessHet. One solution is to add "-AX ExcessHet -AX InbreedingCoeff". It doesnt exactly solve the problem, but it avoids hitting the problem code. |
Awesome! It is useful. Thank you very much! |
IllegalStateException in GenotypeGVCFs after GenomicsDBImport - GATK 4.2.6.1
Looks like there are similar issues occurring in #7639 and #7933. This is a follow up report from the GATK Forum.
GATK Forum Post: (https://gatk.broadinstitute.org/hc/en-us/community/posts/6972994559643-java-lang-IllegalStateException-in-GenotypeGVCFs-after-GenomicsDBImport-GATK-4-2-6-1)
Bug Report
Tools/Methods
GenotypeGVCFs --> GenomicsDBImport
Affected version(s)
-GenomicsDBImport: GATK 4.2.4.0
-GenotypeGVCFs: GATK 4.2.6.1
Description
IllegalStateException being thrown in GenotypeGVCFs after GenomicsDBImport. Exception denotes that "genome has no likelihoods". User is dividing into 50 intervals.
Stacktrace:
Exact Commands Used:
GenomicsDBImport:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms2G -Xmx20G -XX:+UseParallelGC -XX:ParallelGCThreads=2 -jar MySoftwares/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar GenomicsDBImport --genomicsdb-workspace-path 007_Database_DBImport_VCFref/database_interval_9 --sample-name-map sample_name_map --intervals 006_IntervalsSplit_DBImport_VCFref/interval_9.list --reader-threads 5 --batch-size 60 --tmp-dir TMPDIR --max-num-intervals-to-import-in-parallel 3 --merge-input-intervals
GenotypeGVCFs:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms4G -Xmx16G -XX:+UseParallelGC -XX:ParallelGCThreads=2 -jar MySoftwares/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar GenotypeGVCFs -R PigeonBatch5/000_DataLinks/000_RefSeq/Cliv2.1_genomic.fasta --intervals 006_IntervalsSplit_DBImport_VCFref/interval_9.list --force-output-intervals PigeonBatch4/008_RawVcfGz/MergeVcf/pigeonBatch1234_filtered.vcf.gz -V gendb://007_Database_DBImport_VCFref/database_interval_9 -O 008_RawVcfGz_DBImport_VCFref/001_DividedIntervals/interval_9.vcf.gz --tmp-dir TMPDIR --allow-old-rms-mapping-quality-annotation-data --only-output-calls-starting-in-intervals --verbosity ERROR
User Description of the Issue:
"I'm using the GenotypeGVCFs function based on GenomicsDBImport database. I've divided the reference into 50 intervals. Some intervals seems ok, but some reports error as following.
I used a VCF file in "--force-output-intervals" for down stream analysis. I've never seen this error without "--force-output-intervals". I've searched for the error message and changed my GATK version to 4.2.6.1 since similar error has been solved as a bug in recent update, but it still not works on my dataset..."
@droazen and @samuelklee , any insight on this?
Thank you,
Anthony
The text was updated successfully, but these errors were encountered: