Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Set no/low coverage regions of short reads to N #24

Open
gwl2 opened this issue Aug 21, 2024 · 1 comment
Open

Feature request: Set no/low coverage regions of short reads to N #24

gwl2 opened this issue Aug 21, 2024 · 1 comment

Comments

@gwl2
Copy link

gwl2 commented Aug 21, 2024

Feature request
An option that when polishing an assembly with short reads, all bases of an assembly that are not covered by at least X (user supplied parameter) reads are set to N, indicating that this part was not polished with short reads.

Why?
When users polish long read assemblies with short reads, they tend to believe that the new consensus is always better than the original assembly across the whole genome. However, it may well be the case that there are parts of the assembly that were not covered by the short reads. In these regions, it's not known whether there are (short read) resolvable errors or not. So if you need single nucleotide accuracy in this region, you should be aware of this fact and be extra careful.

@rrwick
Copy link
Owner

rrwick commented Aug 22, 2024

That's a good point, so thanks for that suggestion.

I don't think I'll build this feature into Polypolish itself, but it could be implemented in a separate script using Polypolish's debug output table. That table shows the depth (using Polypolish's fractional depth for multi-mapped reads) for each position in the assembly. Also, the status column will contain low_depth if that depth is below Polypolish's --min_depth threshold.

So a script could be made to:

  • Read in the assembly
  • Read the debug table and note which positions are below a user-defined depth threshold
  • Output the assembly with the low-depth positions masked

You're of course welcome to code this up yourself! If you'd rather I do it, I can add it to my to-do-at-some-point list, but I can't make any promises about when it will be ready 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants