-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Odd fits at learn errors step #964
Comments
Do you know if this data has binned quality scores? And what machine was used to generate the data (e.g. MiSeq, NovaSeq...)? |
Thanks your for quick reply :) It's a NextSeq paired-end 2x 150 run. But yes, now that you say, looking at the heatmap on the plot I posted above, it does appear that the quality scores are binned. Below is an example of the first sequence in one of the files: @NB501850:81:HKM2LBGX5:1:11101:19604:1276 2:N:0:GTGGCC |
See suggestions here and in the following posts on how to deal with error-rate fitting when binned quality scores are present: #938 (comment) |
That's great, thanks very much Ben |
error-fit-example.pdf
Hi,
I have a similar issue that others have come across relating to random orientations of sequences due to our experimental design in the lab and I think it is impacting error learning.
I tried to fix the orientation problem with "Decipher::OrientNucleotides()", however when using "Decipher::readDNAStringset()" with fastq files, the quality scores are ignored and re-writing the files to fastq after "Decipher::OrientNucleotides()" ends up with nonsense quality scores. I saw that in post #434 you mentioned running through the dada2 pipeline, then checking for reverse compliments by hand, so thanks for that.
I am actually a bit concerned with how my error plots look though. The estimated error rates seem to experience a pretty severe dog-leg up at higher quality scores, in some instances dipping below the line of expected error rates - and the estimated fit in this area is generally quite poor.
Have you seen this before and do you have any suggestion for parameters I could tweak to address this?
Thanks for your time and your excellent package!
Edit: Added example figure of estimated error rates
The text was updated successfully, but these errors were encountered: