-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow LocatableXsvFuncotationFactory to read gzipped files #8363
Conversation
Hello, @jonn-smith and @lbergelson: I'd like to make a custom data source using tabular data as the input. These data should be matched on position, not considering allele, using LocatableXSV as the type. There is an example of a TSV using Oreganno in the docs (https://gatk.broadinstitute.org/hc/en-us/articles/360035889931-Funcotator-Information-and-Tutorial).; however, I ran into two issues:
Below is an example input. You can bgzip this and index using:
With this PR, I think funcotator will now support gzipped LocatableXSV sources. Would it be possible to add this? |
@bbimber This looks like a good change, but I don't think it'll solve the problem. The This is not a good design (my fault), but it's how the tool operates as of right now. |
@jonn-smith, I did see XsvLocatableTableCodec and the .config file path, but this does not appear to work. To be clear this is something like:
In IndexFeatureFile (
IndexFactory.createDynamicIndex(featurePath.toPath(), ...) where featurePath is the config file. This calls IndexFactory to open a lineReader on the config file (not the backing data source): https://github.com/samtools/htsjdk/blob/6d3fc7bc1f613ecfce1c22d368f3ae17cb86823d/src/main/java/htsjdk/tribble/index/IndexFactory.java#L598. This then fails during XsvLocatableTableCodec.readActualHeader(), since this is trying to read the config file, not the TXT file. |
@jonn-smith also, I've tried this locally and I'm pretty sure the change I propose here allows Funcotator to use txt.gz LocatableXSV sources, where that file is just indexed separately with tabix |
@jonn-smith: any thoughts on the PR? I could try to put in a test, but this does allow Funcotator to read gzipped TSV files, which is rather useful when you're dealing with large inputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change seems fine to me @bbimber, provided you add an integration test demonstrating the new gzip support in Funcotator.
ok |
@droazen: test added (technically a unit test), but I dont think I'm able to kick off the test suite. |
I added one unrelated bugfix. FuncotatorUtils.createReferenceSnippet tries to expand the reference window. When doing this, it should never allow a start less than 1. The last commit addresses that. Note: I did not see an easy way for createReferenceSnippet() to identify the length of the contig (such as access to the SequenceDictionary), but it would in theory be useful to also check contig size and not exceed it. @droazen or @jonn-smith: it would be helpful if you could approve the test run
|
@droazen or @jonn-smith: checking back: any chance someone would be able to approve the workflow so tests can run? |
@bbimber - Sorry for the delay approved and running. |
@jonn-smith: thanks. looks like the tests passed. what do you think about the PR and level of testing? |
@droazen and/or @jonn-smith: is there anything else you need on this PR? It appears the test is passing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bbimber Back to you with a quick request
Object[][] arr1 = provideDataForTestCreateFuncotations(false); | ||
Object[][] arr2 = provideDataForTestCreateFuncotations(true); | ||
return Stream.concat(Arrays.stream(arr1), Arrays.stream(arr2)). | ||
toArray(size -> (Object[][]) Array.newInstance(arr1.getClass().getComponentType(), arr1.length + arr2.length)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use the preexisting utility method Utils.concat(arr1, arr2, Object[][]::new)
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, done
@droazen: the change is added. it looks like this needs approval to kick off tests? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Tests pass, merging this one
@droazen Thanks! |
No description provided.