Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReVisit the PileupCaller haplotype filtering heuristics #8494

Open
jamesemery opened this issue Aug 24, 2023 · 0 comments
Open

ReVisit the PileupCaller haplotype filtering heuristics #8494

jamesemery opened this issue Aug 24, 2023 · 0 comments

Comments

@jamesemery
Copy link
Collaborator

The recent branch #8489 has demonstrated that there are some problematic edge cases in the pileup allele merging code that could cause pathological numbers of haplotypes to be handed to the genotyper. In updating the bug in that branch it was observed that it is very common that there are score ties at the 5th haplotype level for the pileupcaller as illustrated by the noise in the updated tests. This algorithm is not a good heuristic and we should replace it with something better, some ideas from that branch that might fix a few of its shortcomings:

  1. Increase/decouple the kmer size used with the reads from the assembly graph kmer size to prevent the filtering step from being redundant with assembly
  2. Normalize the scores to the haplotype lengths to deal with haplotype size bias.
  3. Change the scores to instead reflect the absolute count of unsupported kmers from the graph to also deal with hapotype size bias.
  4. Iteratively expand the kmer size used for filtering to pare down the number of haplotypes in a more principled fashion.
  5. Utilize the read kmer occurrence counts to construct the scores in order to reduce the risk of spurious reads being sufficient support for a given haplotype.

We have observed that there can be significant changes to the actual genotyping engine output from the pileup engine from even relatively minor changes to the pileupcalling merging code. We should strive to find a more principled solution for merging haplotypes than the one we have currently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant