You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running GATK with specific interval(s), the default behavior is to include any variant spanning those interval(s). When running scatter/gather jobs, this behavior is generally not what one wants, since this would result in variants spanning the job intervals getting included twice.
In a handful of GATK tools, there is support for something like --ignore-variants-starting-outside-interval, which is probably designed to solve this problem. GenotypeGVCFs supports this. However, the implementation/support is generally tool-level and I dont believe all tools support this. For example, SelectVariants does not appear to. If one wants to run a scatter/gather task that doesnt start with a GATK tool that supports --ignore-variants-starting-outside-interval, you're out of luck.
My questions are:
Am I completely missing some existing capability?
There is already some low-level support in the engine for control over intervals. Would you be receptive to a PR that pushes support for "--ignore-variants-starting-outside-intervals" lower into GATK? Perhaps into VariantWalkerBase? One possibility would be to create a StartsWithinIntervalsVariantFilter, and override makeVariantFilter() to inject it. I dont think this would be particularly invasive, and could be pretty useful across many tools. As part of this, MultiVariantWalkerGroupedOnStart's argument would get merged with this.
The text was updated successfully, but these errors were encountered:
bbimber
changed the title
Low-level general solution for --ignore-variants-starting-outside-interval?
Low-level/general solution for --ignore-variants-starting-outside-interval?
Oct 19, 2022
@bbimber Yes, this is an excellent suggestion and would be very useful! We actually already do have an open PR that adds such a general argument for VariantWalkers, but it's been languishing for a while: #6388
I'll see how difficult it would be to resurrect this old PR.
When running GATK with specific interval(s), the default behavior is to include any variant spanning those interval(s). When running scatter/gather jobs, this behavior is generally not what one wants, since this would result in variants spanning the job intervals getting included twice.
In a handful of GATK tools, there is support for something like --ignore-variants-starting-outside-interval, which is probably designed to solve this problem. GenotypeGVCFs supports this. However, the implementation/support is generally tool-level and I dont believe all tools support this. For example, SelectVariants does not appear to. If one wants to run a scatter/gather task that doesnt start with a GATK tool that supports --ignore-variants-starting-outside-interval, you're out of luck.
My questions are:
Am I completely missing some existing capability?
There is already some low-level support in the engine for control over intervals. Would you be receptive to a PR that pushes support for "--ignore-variants-starting-outside-intervals" lower into GATK? Perhaps into VariantWalkerBase? One possibility would be to create a StartsWithinIntervalsVariantFilter, and override makeVariantFilter() to inject it. I dont think this would be particularly invasive, and could be pretty useful across many tools. As part of this, MultiVariantWalkerGroupedOnStart's argument would get merged with this.
The text was updated successfully, but these errors were encountered: