-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing 1 Mbp of alignment on Chromosome 16 between T2T-CHM13 v1.0 and GRCh38 #816
Comments
My current hacked workaround in case it is useful to anyone else is to mask all aligned sequences in the reference and query and then map again and combine the PAFs.
|
I can confirm this is a bug, at least not an intended behavior. v2.19+ is missing this inversion because v2.19+ can chain through this region (older versions can't). Although minimap2 has a procedure to rescue inversions, it is somehow not effective over this ~1Mb inversion. I am looking at this. It may take time to fix. @fenderglass your #806 is probably caused by the same issue. Sorry for the late response. |
@fenderglass Do you still have the examples shown in #806? It seems simpler than Mitchell's chr16 case and might be easier for debugging. Thanks! |
I believe I have alleviated the issue with the current HEAD – a complete fix would require a chaining algorithm that can go through inversions, which is hard. The
@fenderglass I also tried the new HEAD on your data and manually checked one locus on contig 142. I think that case is also fixed. Still need to do more test before cutting a release. |
I see this often as well. It seems that flanking sequences around large inversions tend to have homology, which causes the overlap. It is difficult to resolve such overlaps with the current minimap2 algorithm. |
Could you send me the data? Thanks. |
Here it is: ref.txt My command: |
Thanks. Long inv and short inv are handled differently. 10k is right on the threshold between long and short, which leads to a corner case that was not considered. The 10k example is fixed on github HEAD. The 5k one is not fixed. Actually all versions of minimap2 have this issue. It is triggered more frequently as a side-effect of finding long indels. I will think about a solution later. PS: I knew the 5k issue would happen. It is great that I have an example that can trigger the issue. |
Thanks for fixing this. I tested on some more simulated inversion and it worked fine. The 5k issue didn't happen in any other case, so cant share more examples. Another related question is that minimap2 does not report small (100bp) inverted alignments. Is it possible to adjust some parameters to increase the sensitivity of minimap2 in identifying more of these smaller inversions? Sample inversion: |
Minimap2 is not intended to call 100bp inversions. You can force minimap2 to align small inversions with minimap2 -cxasm20 -z200,50 -s50 but I am not sure this would be an improvement. |
Thanks for sharing the parameters. I understand the limitations in finding small genomic rearrangements. |
@mrvollger Although v2.23 fixed the chr16 inversion, I noticed minimap2 missed another inversion on |
Thanks @lh3! Out of curiosity was the issue in the beta def region ~6-14Mbp? |
Sorry, I meant to say chr3, towards the end of chr3. I haven't checked that chr8 inversion yet. |
Got it, thanks! |
Hi Heng,
I am looking at a 1 Mbp region on chr16 that doesn't seem to align when comparing GRCh38 and T2T, but there is a lot of homology in the region. When I do a whole genome alignment I get about 1 Mbp of missing alignment around ~
chr16:21,000,000-22,000,000
and more specifically I get these alignments (including the flanking regions):With this command (version 2.22):
And here is an image of the alignment (T2T on the bottom)
However if I take just the region that is mostly unaligned and align it then I get a nearly complete alignment in the reverse complement, suggesting a missed inversion I think.
Command:
I have also dropped the two paf files and references at this link:
https://eichlerlab.gs.washington.edu/help/mvollger/share/mm2-issue-chr16/
I am not sure the cause of this, but it seems like a large event to miss, and hopefully I this is not just some mistake on my part.
(And credit to Ariel Gershman for finding this issue, if it is a real issue and not some mistake on my part)
Thanks in advance!
Mitchell
The text was updated successfully, but these errors were encountered: