Odd fits at learn errors step #964

gdunshea · 2020-03-02T18:19:33Z

I have a similar issue that others have come across relating to random orientations of sequences due to our experimental design in the lab and I think it is impacting error learning.

I tried to fix the orientation problem with "Decipher::OrientNucleotides()", however when using "Decipher::readDNAStringset()" with fastq files, the quality scores are ignored and re-writing the files to fastq after "Decipher::OrientNucleotides()" ends up with nonsense quality scores. I saw that in post #434 you mentioned running through the dada2 pipeline, then checking for reverse compliments by hand, so thanks for that.

I am actually a bit concerned with how my error plots look though. The estimated error rates seem to experience a pretty severe dog-leg up at higher quality scores, in some instances dipping below the line of expected error rates - and the estimated fit in this area is generally quite poor.

Have you seen this before and do you have any suggestion for parameters I could tweak to address this?

Thanks for your time and your excellent package!

Edit: Added example figure of estimated error rates

benjjneb · 2020-03-03T16:38:18Z

Do you know if this data has binned quality scores? And what machine was used to generate the data (e.g. MiSeq, NovaSeq...)?

gdunshea · 2020-03-03T18:11:12Z

Thanks your for quick reply :)

It's a NextSeq paired-end 2x 150 run. But yes, now that you say, looking at the heatmap on the plot I posted above, it does appear that the quality scores are binned. Below is an example of the first sequence in one of the files:

@NB501850:81:HKM2LBGX5:1:11101:19604:1276 2:N:0:GTGGCC
CGTCGCTCCTACCGATTGAGTGATCCGGTGAATAATTCGGACTGCAGCAATGTTTGGATCCCGAACGTTGCAGCGGAAAGTTTAGTGAACCTTATCACTTAGAGGAAGGAGAAGTCGTAACA
+
EEEEEEEEEEEEEEEAE/EEEE/EEEEEEEAAEEEEEEEEEAEEEAEE<EEEEEEEE<EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEAEEEEAEEAEE

benjjneb · 2020-03-09T19:36:24Z

See suggestions here and in the following posts on how to deal with error-rate fitting when binned quality scores are present: #938 (comment)

gdunshea · 2020-03-10T13:22:36Z

That's great, thanks very much Ben

benjjneb closed this as completed Apr 6, 2020

JacobRPrice mentioned this issue Mar 24, 2021

Binned quality scores and their effect on (non-decreasing) trans rates #1307

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Odd fits at learn errors step #964

Odd fits at learn errors step #964

gdunshea commented Mar 2, 2020 •

edited

Loading

benjjneb commented Mar 3, 2020

gdunshea commented Mar 3, 2020 •

edited

Loading

benjjneb commented Mar 9, 2020

gdunshea commented Mar 10, 2020 •

edited

Loading

Odd fits at learn errors step #964

Odd fits at learn errors step #964

Comments

gdunshea commented Mar 2, 2020 • edited Loading

benjjneb commented Mar 3, 2020

gdunshea commented Mar 3, 2020 • edited Loading

benjjneb commented Mar 9, 2020

gdunshea commented Mar 10, 2020 • edited Loading

gdunshea commented Mar 2, 2020 •

edited

Loading

gdunshea commented Mar 3, 2020 •

edited

Loading

gdunshea commented Mar 10, 2020 •

edited

Loading