Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenomeDBImport output errors resulting in incomplete DB do not result in error return (exit code == 0). #7598

Closed
vruano opened this issue Dec 12, 2021 · 0 comments · Fixed by #7613

Comments

@vruano
Copy link
Contributor

vruano commented Dec 12, 2021

Bug Report

Affected tool(s) or class(es)

GenomeDBImport

Affected version(s)

01:22:35.395 INFO  GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.2.0.0
01:22:35.395 INFO  GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
01:22:35.481 INFO  GenomicsDBImport - Executing as vr6@node-14-20 on Linux v5.4.0-90-generic amd64
01:22:35.481 INFO  GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_282-b08
01:22:35.482 INFO  GenomicsDBImport - Start Date/Time: 10 December 2021 01:22:34 UTC

Description

It seems that is possible for some IO error affecting the production of the output tile-db file/folder that is ignored by the reslt of the tool run resulting in a falsely succesful completion. One won't realize of it unil tries to use that db with genotype-gvcfs.

STDERR:

Dec 10, 2021 1:22:35 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
01:22:35.395 INFO  GenomicsDBImport - ------------------------------------------------------------
01:22:35.395 INFO  GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.2.0.0
01:22:35.395 INFO  GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
01:22:35.481 INFO  GenomicsDBImport - Executing as vr6@node-14-20 on Linux v5.4.0-90-generic amd64
01:22:35.481 INFO  GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_282-b08
01:22:35.482 INFO  GenomicsDBImport - Start Date/Time: 10 December 2021 01:22:34 UTC
01:22:35.482 INFO  GenomicsDBImport - ------------------------------------------------------------
01:22:35.482 INFO  GenomicsDBImport - ------------------------------------------------------------
01:22:35.483 INFO  GenomicsDBImport - HTSJDK Version: 2.24.0
01:22:35.483 INFO  GenomicsDBImport - Picard Version: 2.25.0
01:22:35.483 INFO  GenomicsDBImport - Built for Spark Version: 2.4.5
01:22:35.483 INFO  GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
01:22:35.483 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
01:22:35.483 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
01:22:35.483 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
01:22:35.483 INFO  GenomicsDBImport - Deflater: IntelDeflater
01:22:35.483 INFO  GenomicsDBImport - Inflater: IntelInflater
01:22:35.483 INFO  GenomicsDBImport - GCS max retries/reopens: 20
01:22:35.483 INFO  GenomicsDBImport - Requester pays: disabled
01:22:35.484 INFO  GenomicsDBImport - Initializing engine
01:24:58.683 INFO  FeatureManager - Using codec BEDCodec to read file file:///lustre/scratch118/malaria/team112/personal/vr6/pf8-update/work/8e/c9ed494e9cd5d45835890fff4fa34c/intervals.bed
01:24:58.801 INFO  IntervalArgumentCollection - Processing 11500 bp from intervals
01:24:58.803 INFO  GenomicsDBImport - Done initializing engine
01:24:59.055 INFO  GenomicsDBLibLoader - GenomicsDB native library version : 1.3.2-e18fa63
01:25:02.076 INFO  GenomicsDBImport - Vid Map JSON file will be written to /lustre/scratch118/malaria/team112/personal/vr6/pf8-update/work/8e/c9ed494e9cd5d45835890fff4fa34c/Pf3D7_08_v3_33.bed.gdb/vidmap.json
01:25:02.077 INFO  GenomicsDBImport - Callset Map JSON file will be written to /lustre/scratch118/malaria/team112/personal/vr6/pf8-update/work/8e/c9ed494e9cd5d45835890fff4fa34c/Pf3D7_08_v3_33.bed.gdb/callset.json
01:25:02.077 INFO  GenomicsDBImport - Complete VCF Header will be written to /lustre/scratch118/malaria/team112/personal/vr6/pf8-update/work/8e/c9ed494e9cd5d45835890fff4fa34c/Pf3D7_08_v3_33.bed.gdb/vcfheader.vcf
01:25:02.077 INFO  GenomicsDBImport - Importing to workspace - /lustre/scratch118/malaria/team112/personal/vr6/pf8-update/work/8e/c9ed494e9cd5d45835890fff4fa34c/Pf3D7_08_v3_33.bed.gdb
01:25:02.078 INFO  ProgressMeter - Starting traversal
01:25:02.078 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Batches Processed   Batches/Minute
[TileDB::FileSystem] Error: (write_to_file) Cannot write to file; File writing error; path=/lustre/scratch118/malaria/team112/personal/vr6/pf8-update/work/8e/c9ed494e9cd5d45835890fff4fa34c/Pf3D7_08_v3_33.bed.gdb/vidmap.json; errno=5(Input/output error)
[TileDB::FileSystem] Error: (write_to_file) Cannot write to file; File writing error; path=/lustre/scratch118/malaria/team112/personal/vr6/pf8-update/work/8e/c9ed494e9cd5d45835890fff4fa34c/Pf3D7_08_v3_33.bed.gdb/vidmap.json; errno=5(Input/output error)
01:25:43.661 INFO  GenomicsDBImport - Starting batch input file preload
01:26:19.244 INFO  GenomicsDBImport - Finished batch preload
01:26:19.244 INFO  GenomicsDBImport - Importing batch 1 with 2 samples
01:30:20.226 INFO  ProgressMeter -             unmapped              5.3                     1              0.2
01:30:20.226 INFO  GenomicsDBImport - Done importing batch 1/1
01:30:20.227 INFO  ProgressMeter -             unmapped              5.3                     1              0.2
01:30:20.227 INFO  ProgressMeter - Traversal complete. Processed 1 total batches in 5.3 minutes.
01:30:20.227 INFO  GenomicsDBImport - Import of all batches to GenomicsDB completed!
01:30:20.227 INFO  GenomicsDBImport - Shutting down engine
[10 December 2021 01:30:20 UTC] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 7.76 minutes.
Runtime.totalMemory()=16078340096

Steps to reproduce

Not sure if it reproducible with any particular imput... it seems that one has to simulate the IO errors for example by using a nearly full storage for the output or create some read-only conflicting file s

Expected behavior

No low-level error messages as the ones above... and that the output can be use for genotype-gvcfs without issue

Actual behavior

Error messages coming from the jni dependency. The tool finishes succesfully in apperance but the output file is missing some content render it unusable for VCF calling.

@vruano vruano changed the title GenomeDBImport output errors resulting in incomplete DB do not result in error return (exit code != 0). GenomeDBImport output errors resulting in incomplete DB do not result in error return (exit code == 0). Dec 13, 2021
lbergelson pushed a commit that referenced this issue Dec 21, 2021
* This fixes #7598 by throwing appropriate java Exceptions with WriteToFileFailures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant