You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The recent branch #8489 has demonstrated that there are some problematic edge cases in the pileup allele merging code that could cause pathological numbers of haplotypes to be handed to the genotyper. In updating the bug in that branch it was observed that it is very common that there are score ties at the 5th haplotype level for the pileupcaller as illustrated by the noise in the updated tests. This algorithm is not a good heuristic and we should replace it with something better, some ideas from that branch that might fix a few of its shortcomings:
Increase/decouple the kmer size used with the reads from the assembly graph kmer size to prevent the filtering step from being redundant with assembly
Normalize the scores to the haplotype lengths to deal with haplotype size bias.
Change the scores to instead reflect the absolute count of unsupported kmers from the graph to also deal with hapotype size bias.
Iteratively expand the kmer size used for filtering to pare down the number of haplotypes in a more principled fashion.
Utilize the read kmer occurrence counts to construct the scores in order to reduce the risk of spurious reads being sufficient support for a given haplotype.
We have observed that there can be significant changes to the actual genotyping engine output from the pileup engine from even relatively minor changes to the pileupcalling merging code. We should strive to find a more principled solution for merging haplotypes than the one we have currently.
The text was updated successfully, but these errors were encountered:
The recent branch #8489 has demonstrated that there are some problematic edge cases in the pileup allele merging code that could cause pathological numbers of haplotypes to be handed to the genotyper. In updating the bug in that branch it was observed that it is very common that there are score ties at the 5th haplotype level for the pileupcaller as illustrated by the noise in the updated tests. This algorithm is not a good heuristic and we should replace it with something better, some ideas from that branch that might fix a few of its shortcomings:
We have observed that there can be significant changes to the actual genotyping engine output from the pileup engine from even relatively minor changes to the pileupcalling merging code. We should strive to find a more principled solution for merging haplotypes than the one we have currently.
The text was updated successfully, but these errors were encountered: