-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add VCF/BCF support for POS=0 coordinate #1573
base: develop
Are you sure you want to change the base?
Conversation
This follows changes in samtools/htslib#1573 See also samtools/htslib#1571
This accidentally reverts the last htscodecs submodule update. Please could you update your local copy with
Also, if you rebase the branch on to the current |
|
Review of `bcftools merge` code and possibly others is necessary because it is relying on uninitialized POS==-1 in synced_bcf_reader This should resolve samtools#1571
This gives me headaches, always manage to mess it up. Did what you suggested and now I've got
Should I merge as suggested and push the result? EDIT: |
Hmm, this is a bit of a rabbit-hole. I don't think it's a good idea to change Now for htslib it turns out that this is moot, because we don't index unmapped reads; we reject SAM positions < 0; and we've actually been pushing this interval into the My test file:
Picard index dump (obtained by compiling htslib with
Note that the whole-chromosome search works because it asks for an interval wide enough to include bin 4680's official range, which means we also pick up the (zero-based) position -1 stuff in there. It wouldn't work if you asked for a smaller interval including that position though. For maximum compatibility we probably want to make This should probably be done as a special case, so we can avoid issues with shifting negative numbers. We may also want to discuss if we should be indexing this position in a currently spec-compliant way, or updating the specification so it follows the practice we've been doing for a while (which would break htsjdk, but as I think it can't do the look-up, probably not too badly compared with the current state). |
The restriction is already enforced by hts_idx_push() before this function is called, so no need to do it again.
When searching for `max_off`, hts_itr_query() looks for a bin to the right of the end of the region. For whole chromosomes, this would be HTS_POS_MAX, which is far beyond the maximum bin position supported. The `bin` calculation overflowed leading to much implementation-defined behaviour as it explored various negative bin numbers. Fix by limiting `end` to the maximum value supported by the index.
Since 71db687 htslib has put ranges starting -1 into the most appropriate bin starting from 0. However, the SAM specification actually says [-1, 0) should go in bin 4680 (for bai / tbx indexes) and htsjdk does this. To be able to handle indexes made both ways, add bin 4680 (or the equivalent number for csi) to the search list in `reg2bins()` as a special case, and make hts_itr_query() check for it when finding `min_off`, when ranges start at -1. Also take the opportunity to make reg2bins() notice when its memory allocations fail, and ensure the error gets propagated if necessary.
Required to support VCF files with variants at position 0 Add region parse flag HTS_PARSE_REF_START_MINUS_1 to make whole-chromosome ranges start at (zero-based position) -1 instead of 0. Rework hts_parse_region() so it can correctly interpret region specifications that start at zero. If the new HTS_PARSE_REF_START_MINUS_1 flag is set, it will also return region [-1, HTS_POS_MAX) for whole-chromosome requests. Make hts_itr_querys() work out if it's making an iterator for a vcf or bcf file and pass the HTS_PARSE_REF_START_MINUS_1 flag to hts_parse_region() if it is. This detection works by inspecting the `readrec` parameter to see what sort of file is being read. This should work as long as hts_itr_querys() is being called via the bcf_itr_querys() or tbx_itr_querys() wrappers, however it may be better to make a new interface that can accept an explicit flag parameter.
These should be interpreted as start of chr. to the given position.
I've pushed up my progress so far on this:
The region parsing part adds a new flag |
Resolves #1571
See also samtools/bcftools#1875