Unusual quality profiles and error plots #1402

emilyvansyoc · 2021-08-30T19:26:09Z

Hello,
I am running into issues that I've not seen before with 16S data and I'm hoping you have some guidance (not related to a bug or coding problem so feel free to close the issue).

This is commercial 16S data that was pre-filtered in an "in-house script" and I think something's going wrong with the Phred quality encoding that is throwing off the error learning algorithm.

Attached is a representative example of the quality profile
and forward and reverse error plots.

Do you have any idea what could cause banding like this in the quality profile plot and if it's related to the weird error learning plots? My biggest concern is that dada2 assigned 25,000+ ASVs to this dataset (animal fecal samples) which seems way off.

Thanks for your help!

reverse-error-plot.pdf

forward-error-plot.pdf

sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.5.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4 parallel stats graphics grDevices
[6] utils datasets methods base

other attached packages:
[1] Biostrings_2.60.1 GenomeInfoDb_1.28.0
[3] XVector_0.32.0 IRanges_2.26.0
[5] S4Vectors_0.30.0 BiocGenerics_0.38.0
[7] forcats_0.5.1 stringr_1.4.0
[9] dplyr_1.0.7 purrr_0.3.4
[11] readr_1.4.0 tidyr_1.1.3
[13] tibble_3.1.2 ggplot2_3.3.4
[15] tidyverse_1.3.1 phyloseq_1.36.0
[17] dada2_1.20.0 Rcpp_1.0.6
[19] BiocManager_1.30.16

loaded via a namespace (and not attached):
[1] nlme_3.1-152 fs_1.5.0
[3] bitops_1.0-7 matrixStats_0.59.0
[5] lubridate_1.7.10 RColorBrewer_1.1-2
[7] httr_1.4.2 tools_4.1.1
[9] backports_1.2.1 utf8_1.2.1
[11] R6_2.5.0 vegan_2.5-7
[13] DBI_1.1.1 mgcv_1.8-36
[15] colorspace_2.0-1 permute_0.9-5
[17] rhdf5filters_1.4.0 ade4_1.7-17
[19] withr_2.4.2 tidyselect_1.1.1
[21] compiler_4.1.1 cli_2.5.0
[23] rvest_1.0.0 Biobase_2.52.0
[25] xml2_1.3.2 DelayedArray_0.18.0
[27] labeling_0.4.2 scales_1.1.1
[29] digest_0.6.27 Rsamtools_2.8.0
[31] jpeg_0.1-8.1 pkgconfig_2.0.3
[33] MatrixGenerics_1.4.0 dbplyr_2.1.1
[35] readxl_1.3.1 rlang_0.4.11
[37] rstudioapi_0.13 farver_2.1.0
[39] generics_0.1.0 hwriter_1.3.2
[41] jsonlite_1.7.2 BiocParallel_1.26.0
[43] RCurl_1.98-1.3 magrittr_2.0.1
[45] GenomeInfoDbData_1.2.6 biomformat_1.20.0
[47] Matrix_1.3-4 munsell_0.5.0
[49] Rhdf5lib_1.14.1 fansi_0.5.0
[51] ape_5.5 lifecycle_1.0.0
[53] stringi_1.6.2 MASS_7.3-54
[55] SummarizedExperiment_1.22.0 zlibbioc_1.38.0
[57] rhdf5_2.36.0 plyr_1.8.6
[59] grid_4.1.1 crayon_1.4.1
[61] lattice_0.20-44 haven_2.4.1
[63] splines_4.1.1 multtest_2.48.0
[65] hms_1.1.0 pillar_1.6.1
[67] igraph_1.2.6 GenomicRanges_1.44.0
[69] reshape2_1.4.4 codetools_0.2-18
[71] reprex_2.0.0 glue_1.4.2
[73] ShortRead_1.50.0 latticeExtra_0.6-29
[75] data.table_1.14.0 RcppParallel_5.1.4
[77] modelr_0.1.8 png_0.1-7
[79] vctrs_0.3.8 foreach_1.5.1
[81] cellranger_1.1.0 gtable_0.3.0
[83] assertthat_0.2.1 broom_0.7.8
[85] survival_3.2-11 iterators_1.0.13
[87] GenomicAlignments_1.28.0 cluster_2.1.2
[89] ellipsis_0.3.2

benjjneb · 2021-08-30T19:42:38Z

This is binned quality score data, which is common on the high-throughput Illumina machines (e.g. NovaSeq). See this thread for discussion: #1307

My biggest concern is that dada2 assigned 25,000+ ASVs to this dataset (animal fecal samples) which seems way off.

If taht is from just 128k reads, then yes that looks like a problem. Any potential issues with binned quality scores (probably) aren't causing that though. Could there be un-removed primer/adapter bases on these reads? Or perhaps a non-typical library preparation strategy like heterogeneity spacers that introduce variation in the start position of the reads?

emilyvansyoc · 2021-08-31T13:28:28Z

Thank you so much for your quick reply. Yes, it seems that we're experiencing similar issues with the binned quality scores. I will also explore your other suggestions (adapters, heterogeneity spacers) and reply here in this thread.

benjjneb closed this as completed May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unusual quality profiles and error plots #1402

Unusual quality profiles and error plots #1402

emilyvansyoc commented Aug 30, 2021

benjjneb commented Aug 30, 2021

emilyvansyoc commented Aug 31, 2021

Unusual quality profiles and error plots #1402

Unusual quality profiles and error plots #1402

Comments

emilyvansyoc commented Aug 30, 2021

benjjneb commented Aug 30, 2021

emilyvansyoc commented Aug 31, 2021