-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HaplotypeCallerSpark doesn't write g.vcf.gz files #4274
Comments
see #4275 for the temporary workaround |
This will likely require a fix in hadoop-bam unless we either copy most of the the hadoop-bam code into gatk or someone comes up with a more clever solution than I. |
Similarly, see #4303 for the inability to write g.bcf files. A much lower priority problem... |
droazen
pushed a commit
that referenced
this issue
Jan 30, 2018
* prevent users from requesting g.vcf.gz in Spark * this is currently broken, see #4274 * add a check to HaplotypeCallerSpark and VariantSparkSink and throw a clear exception in this case * added test for GVCF writing in VariantSparkSink which previously didn't exist * added new UserException.UnimplementedFeature class * closes #4275
lbergelson
added a commit
that referenced
this issue
Jan 31, 2018
* prevent users from requesting g.vcf.gz in Spark * this is currently broken, see #4274 * add a check to HaplotypeCallerSpark and VariantSparkSink and throw a clear exception in this case * added test for GVCF writing in VariantSparkSink which previously didn't exist * added new UserException.UnimplementedFeature class * closes #4275
@tomwhite Should be a fairly easy fix in Hadoop-BAM, we think. |
lbergelson
pushed a commit
that referenced
this issue
May 10, 2018
* Support g.vcf.gz files in Spark tools * fixes #4274 * upgrade hadoop-bam 7.9.1 -> 7.10.0 * Remove bcf files from Spark tests since spark currently can't write bcf files correctly * this is tracked by #4303 * a file called named .bcf is produced, but the file is actually encoded as a vcf * updated tests to verify that the file extension matches the actual datatype in the file
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
If you ask HaplotypeCallerSpark for a gvcf.gz it outputs a base pair resolution GVCF with no blocking. This is due to confusion in hadoop-bam / VariantSparkSink.
It works fine if you write an uncompressed g.vcf.
This is due to a conditional statement in
KeyIgnoringVCFOutputFormat.getRecordWriter(askAttemptContext ctx)
The two branches call two different overloads of
getRecordWriter
The first is public, and overriden to provide GVCF writers in our code, the second is private and doesn't know about our GVCF writer. We could override
getRecordWriter(ctx)
but we need access to a constructor forVCFRecordWriter
that takes a stream and propagates the ctx which doesn't exist.The text was updated successfully, but these errors were encountered: