Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIO output support for ApplyBQSR #4424

Merged
merged 5 commits into from
Apr 16, 2018
Merged

Conversation

jean-philippe-martin
Copy link
Contributor

Add path support, and test.

@jean-philippe-martin
Copy link
Contributor Author

I confirmed this can output to GCS like so:

$ ./gatk ApplyBQSR \
  -I src/test/resources/org/broadinstitute/hellbender/tools/BQSR/HiSeq.1mb.1RG.2k_lines.alternate.bam \
  --bqsr-recal-file src/test/resources/org/broadinstitute/hellbender/tools/BQSR/HiSeq.20mb.1RG.table.gz \
  -O gs://$BUCKET/test-output/applybqsr.bam 

@codecov-io
Copy link

codecov-io commented Feb 16, 2018

Codecov Report

Merging #4424 into master will increase coverage by 1.114%.
The diff coverage is 71.053%.

@@              Coverage Diff               @@
##             master     #4424       +/-   ##
==============================================
+ Coverage     79.04%   80.154%   +1.114%     
- Complexity    16447     19883     +3436     
==============================================
  Files          1047      1092       +45     
  Lines         59189     72137    +12948     
  Branches       9672     12253     +2581     
==============================================
+ Hits          46783     57821    +11038     
- Misses         8645      9988     +1343     
- Partials       3761      4328      +567
Impacted Files Coverage Δ Complexity Δ
...roadinstitute/hellbender/utils/read/ReadUtils.java 85.039% <100%> (+5.089%) 214 <3> (+30) ⬆️
...itute/hellbender/tools/walkers/bqsr/ApplyBQSR.java 91.667% <100%> (ø) 6 <1> (ø) ⬇️
...ava/org/broadinstitute/hellbender/utils/Utils.java 80.435% <33.333%> (-1.092%) 140 <2> (+1)
...itute/hellbender/utils/test/SamAssertionUtils.java 72.821% <77.273%> (-2.46%) 46 <11> (+6)
...titute/hellbender/utils/logging/OneShotLogger.java 78.571% <0%> (-21.429%) 3% <0%> (ø)
...ellbender/engine/filters/VariantFilterLibrary.java 33.333% <0%> (-16.667%) 1% <0%> (ø)
...stitute/hellbender/engine/ReferenceDataSource.java 70% <0%> (-10%) 7% <0%> (+3%)
...stitute/hellbender/engine/ReferenceFileSource.java 65.217% <0%> (-7.51%) 8% <0%> (+4%)
...titute/hellbender/utils/test/ArgumentsBuilder.java 93.151% <0%> (-6.849%) 31% <0%> (+12%)
...ools/spark/sv/evidence/IntervalCoverageFinder.java 82.222% <0%> (-6.239%) 19% <0%> (+11%)
... and 205 more

@jean-philippe-martin
Copy link
Contributor Author

Part of bug #2422

@jean-philippe-martin jean-philippe-martin changed the title ApplyBQSR can output BAMs to GCS NIO output support for ApplyBQSR Feb 17, 2018
@droazen droazen self-assigned this Feb 23, 2018
}
if (Files.isDirectory(path)) {
throw new IOException("File '" + fname + "' exists but is a directory");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this check Files.isRegularFile() as well? It seems wrong to try to take the md5 of a pipe or something like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, done.

return SamStreams.isCRAMFile(bis);
public static boolean hasCRAMFileContents(final Path putativeCRAMPath) {
try (final InputStream fileStream = Files.newInputStream(putativeCRAMPath)) {
try (final BufferedInputStream bis = new BufferedInputStream(fileStream)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to have 2 resources declared in the same try() block rather than having 2 nested try. I would switch it back to how it was.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't compile that way, complains about an IOException. Suggestions welcome, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tried it, and it does actually work as a single try-with-resources

@@ -531,6 +534,17 @@ public static String calcMD5(final byte[] bytes) {
public static String calculateFileMD5( final File file ) throws IOException{
return Utils.calcMD5(FileUtils.readFileToByteArray(file));
}
public static String calculatePathMD5(final Path path) throws IOException{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add javadoc for the new overload of calculatePathMD5()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can you make sure that these new method overloads are covered by unit tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added Javadoc and made sure that the first calls the second. Since the first is tested, that also covers the second automatically.

@@ -531,6 +534,17 @@ public static String calcMD5(final byte[] bytes) {
public static String calculateFileMD5( final File file ) throws IOException{
return Utils.calcMD5(FileUtils.readFileToByteArray(file));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like if we can we should make all the old methods delegate to the new ones

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, done.

@@ -161,14 +171,46 @@ public static String samsEqualStringent(final File actualSam, final File expecte
return compareReads(actualSam, expectedSam, validation, reference);
}

public static String samsEqualStringent(final Path actualSam, final Path expectedSam, final ValidationStringency validation, final Path reference) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the file overload of this method call into the Path version?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add javadoc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly! Done.

@@ -215,6 +257,32 @@ private static String equalHeadersIgnoreCOandPG(final File actualSam, final File
return msg;
}
}
private static String equalHeadersIgnoreCOandPG(final Path actualSam, final Path expectedSam, final ValidationStringency validation, final Path reference) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the file overload call into this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh actually I meant to delete it; it's private and has no callers.

assertCRAMContents(putativeCRAMFile.toPath());
}
}
public static void assertCRAMContentsIfCRAM(final Path putativeCRAMPath) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadoc for this method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done; I also changed the File version to more cleanly call the Path version.

if (params.reference != null) {
refFile = new File(params.reference);
args.add("-R"); args.add(refFile.getAbsolutePath());
try (FileSystem jimfs = Jimfs.newFileSystem(Configuration.unix())) {
Copy link
Contributor

@droazen droazen Mar 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you keep the existing test code as-is, and make the Jimfs-based tests a separate test method? It would also be good to have one test case that writes to a live GCS bucket (marked as groups = {bucket})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-instated the old (File-based) test. Remains to add a GCS test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCS test added. For now it runs all the cases, but since the other tests already establish that ApplyBQSR does the right thing and we just want to make sure it can write to GCS, it'd make sense to only run a subset of the cases on the cloud. If you let me know which ones you want, I can cull the rest.

}

@Test(dataProvider = "testCRAMContentsFail", expectedExceptions=AssertionError.class)
public void testAssertCRAMContentsFail(File putativeCRAMFile) {
SamAssertionUtils.assertCRAMContents(putativeCRAMFile);
SamAssertionUtils.assertCRAMContents(putativeCRAMFile.toPath());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test the file-based overloads as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no file-based overload of assertCRAMContents anymore.

@droazen
Copy link
Contributor

droazen commented Mar 12, 2018

@jean-philippe-martin Review complete, back to you

@jean-philippe-martin
Copy link
Contributor Author

@droazen all review comments applied, back to you!


/**
* Validate/assert that the contents are CRAM if the extension is .cram
*/ public static void assertCRAMContentsIfCRAM(final Path putativeCRAMPath) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation here is off

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thank you! Fixed.

* @return file's MD5 in String form
* @throws IOException if the file could not be read
*/
public static String calculatePathMD5(final Path path) throws IOException{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a warning to the javadoc for this method that it slurps the entire file into memory, and should not be used for large files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -529,7 +532,29 @@ public static String calcMD5(final byte[] bytes) {
* @throws IOException if the file could not be read
*/
public static String calculateFileMD5( final File file ) throws IOException{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a warning to the javadoc for this method that it slurps the entire file into memory, and should not be used for large files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

}

@Test(dataProvider = "ApplyBQSRTest")
public void testApplyBQSRPath(ABQSRTest params) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't need to be run on the full ApplyBQSRTest DataProvider -- can you create a separate DataProvider with ONE of the test cases from the ApplyBQSRTest provider, and use that here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! Done!


runCommandLine(args);
@Test(dataProvider = "ApplyBQSRTest", groups={"cloud", "bucket"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should be just in the bucket group, not the cloud group, since execution is local

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thank you! Fixed!

SamAssertionUtils.assertSamsEqual(outPath, new File(params.expectedFile).toPath(), refPath);
} finally {
if (Files.exists(outPath)) {
Files.delete(outPath);
Copy link
Contributor

@droazen droazen Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't BucketUtils.getTempFilePath() already mark this for deletion on JVM exit? (I guess this doesn't hurt though)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, you're right! I removed the delete and the try/finally since they are redundant. I also added a comment to help in case others also forget.

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final review complete -- a few minor remaining TODOs, then we can merge this.

@jean-philippe-martin
Copy link
Contributor Author

Thank you very much @droazen! This should take care of all the comments, so once the checks are green I'll press "squash and merge." Then I'll move on to the next tool to update!

@jean-philippe-martin jean-philippe-martin merged commit 984e783 into master Apr 16, 2018
@jean-philippe-martin jean-philippe-martin deleted the jp_applybqsr_nio branch April 16, 2018 22:38
cwhelan pushed a commit to cwhelan/gatk-linked-reads that referenced this pull request May 25, 2018
* ApplyBQSR can output BAMs to GCS

Add path support, and test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants