-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIO output support for ApplyBQSR #4424
Conversation
Add path support, and test.
I confirmed this can output to GCS like so:
|
Codecov Report
@@ Coverage Diff @@
## master #4424 +/- ##
==============================================
+ Coverage 79.04% 80.154% +1.114%
- Complexity 16447 19883 +3436
==============================================
Files 1047 1092 +45
Lines 59189 72137 +12948
Branches 9672 12253 +2581
==============================================
+ Hits 46783 57821 +11038
- Misses 8645 9988 +1343
- Partials 3761 4328 +567
|
Part of bug #2422 |
} | ||
if (Files.isDirectory(path)) { | ||
throw new IOException("File '" + fname + "' exists but is a directory"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this check Files.isRegularFile() as well? It seems wrong to try to take the md5 of a pipe or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, done.
return SamStreams.isCRAMFile(bis); | ||
public static boolean hasCRAMFileContents(final Path putativeCRAMPath) { | ||
try (final InputStream fileStream = Files.newInputStream(putativeCRAMPath)) { | ||
try (final BufferedInputStream bis = new BufferedInputStream(fileStream)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to have 2 resources declared in the same try()
block rather than having 2 nested try
. I would switch it back to how it was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't compile that way, complains about an IOException. Suggestions welcome, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tried it, and it does actually work as a single try-with-resources
@@ -531,6 +534,17 @@ public static String calcMD5(final byte[] bytes) { | |||
public static String calculateFileMD5( final File file ) throws IOException{ | |||
return Utils.calcMD5(FileUtils.readFileToByteArray(file)); | |||
} | |||
public static String calculatePathMD5(final Path path) throws IOException{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add javadoc for the new overload of calculatePathMD5()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can you make sure that these new method overloads are covered by unit tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added Javadoc and made sure that the first calls the second. Since the first is tested, that also covers the second automatically.
@@ -531,6 +534,17 @@ public static String calcMD5(final byte[] bytes) { | |||
public static String calculateFileMD5( final File file ) throws IOException{ | |||
return Utils.calcMD5(FileUtils.readFileToByteArray(file)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like if we can we should make all the old methods delegate to the new ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, done.
@@ -161,14 +171,46 @@ public static String samsEqualStringent(final File actualSam, final File expecte | |||
return compareReads(actualSam, expectedSam, validation, reference); | |||
} | |||
|
|||
public static String samsEqualStringent(final Path actualSam, final Path expectedSam, final ValidationStringency validation, final Path reference) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the file overload of this method call into the Path version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add javadoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly! Done.
@@ -215,6 +257,32 @@ private static String equalHeadersIgnoreCOandPG(final File actualSam, final File | |||
return msg; | |||
} | |||
} | |||
private static String equalHeadersIgnoreCOandPG(final Path actualSam, final Path expectedSam, final ValidationStringency validation, final Path reference) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the file overload call into this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh actually I meant to delete it; it's private and has no callers.
assertCRAMContents(putativeCRAMFile.toPath()); | ||
} | ||
} | ||
public static void assertCRAMContentsIfCRAM(final Path putativeCRAMPath) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadoc for this method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done; I also changed the File version to more cleanly call the Path version.
if (params.reference != null) { | ||
refFile = new File(params.reference); | ||
args.add("-R"); args.add(refFile.getAbsolutePath()); | ||
try (FileSystem jimfs = Jimfs.newFileSystem(Configuration.unix())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you keep the existing test code as-is, and make the Jimfs-based tests a separate test method? It would also be good to have one test case that writes to a live GCS bucket (marked as groups = {bucket}
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-instated the old (File-based) test. Remains to add a GCS test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCS test added. For now it runs all the cases, but since the other tests already establish that ApplyBQSR does the right thing and we just want to make sure it can write to GCS, it'd make sense to only run a subset of the cases on the cloud. If you let me know which ones you want, I can cull the rest.
} | ||
|
||
@Test(dataProvider = "testCRAMContentsFail", expectedExceptions=AssertionError.class) | ||
public void testAssertCRAMContentsFail(File putativeCRAMFile) { | ||
SamAssertionUtils.assertCRAMContents(putativeCRAMFile); | ||
SamAssertionUtils.assertCRAMContents(putativeCRAMFile.toPath()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test the file-based overloads as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no file-based overload of assertCRAMContents
anymore.
@jean-philippe-martin Review complete, back to you |
@droazen all review comments applied, back to you! |
|
||
/** | ||
* Validate/assert that the contents are CRAM if the extension is .cram | ||
*/ public static void assertCRAMContentsIfCRAM(final Path putativeCRAMPath) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation here is off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thank you! Fixed.
* @return file's MD5 in String form | ||
* @throws IOException if the file could not be read | ||
*/ | ||
public static String calculatePathMD5(final Path path) throws IOException{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a warning to the javadoc for this method that it slurps the entire file into memory, and should not be used for large files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
@@ -529,7 +532,29 @@ public static String calcMD5(final byte[] bytes) { | |||
* @throws IOException if the file could not be read | |||
*/ | |||
public static String calculateFileMD5( final File file ) throws IOException{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a warning to the javadoc for this method that it slurps the entire file into memory, and should not be used for large files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
} | ||
|
||
@Test(dataProvider = "ApplyBQSRTest") | ||
public void testApplyBQSRPath(ABQSRTest params) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn't need to be run on the full ApplyBQSRTest
DataProvider
-- can you create a separate DataProvider
with ONE of the test cases from the ApplyBQSRTest
provider, and use that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely! Done!
|
||
runCommandLine(args); | ||
@Test(dataProvider = "ApplyBQSRTest", groups={"cloud", "bucket"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test should be just in the bucket
group, not the cloud
group, since execution is local
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thank you! Fixed!
SamAssertionUtils.assertSamsEqual(outPath, new File(params.expectedFile).toPath(), refPath); | ||
} finally { | ||
if (Files.exists(outPath)) { | ||
Files.delete(outPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't BucketUtils.getTempFilePath()
already mark this for deletion on JVM exit? (I guess this doesn't hurt though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, you're right! I removed the delete and the try/finally since they are redundant. I also added a comment to help in case others also forget.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final review complete -- a few minor remaining TODOs, then we can merge this.
Thank you very much @droazen! This should take care of all the comments, so once the checks are green I'll press "squash and merge." Then I'll move on to the next tool to update! |
* ApplyBQSR can output BAMs to GCS Add path support, and test.
Add path support, and test.