You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I am running into issues that I've not seen before with 16S data and I'm hoping you have some guidance (not related to a bug or coding problem so feel free to close the issue).
This is commercial 16S data that was pre-filtered in an "in-house script" and I think something's going wrong with the Phred quality encoding that is throwing off the error learning algorithm.
Attached is a representative example of the quality profile
and forward and reverse error plots.
Do you have any idea what could cause banding like this in the quality profile plot and if it's related to the weird error learning plots? My biggest concern is that dada2 assigned 25,000+ ASVs to this dataset (animal fecal samples) which seems way off.
This is binned quality score data, which is common on the high-throughput Illumina machines (e.g. NovaSeq). See this thread for discussion: #1307
My biggest concern is that dada2 assigned 25,000+ ASVs to this dataset (animal fecal samples) which seems way off.
If taht is from just 128k reads, then yes that looks like a problem. Any potential issues with binned quality scores (probably) aren't causing that though. Could there be un-removed primer/adapter bases on these reads? Or perhaps a non-typical library preparation strategy like heterogeneity spacers that introduce variation in the start position of the reads?
Thank you so much for your quick reply. Yes, it seems that we're experiencing similar issues with the binned quality scores. I will also explore your other suggestions (adapters, heterogeneity spacers) and reply here in this thread.
Hello,
I am running into issues that I've not seen before with 16S data and I'm hoping you have some guidance (not related to a bug or coding problem so feel free to close the issue).
This is commercial 16S data that was pre-filtered in an "in-house script" and I think something's going wrong with the Phred quality encoding that is throwing off the error learning algorithm.
Attached is a representative example of the quality profile
and forward and reverse error plots.
Do you have any idea what could cause banding like this in the quality profile plot and if it's related to the weird error learning plots? My biggest concern is that dada2 assigned 25,000+ ASVs to this dataset (animal fecal samples) which seems way off.
Thanks for your help!
reverse-error-plot.pdf
forward-error-plot.pdf
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices
[6] utils datasets methods base
other attached packages:
[1] Biostrings_2.60.1 GenomeInfoDb_1.28.0
[3] XVector_0.32.0 IRanges_2.26.0
[5] S4Vectors_0.30.0 BiocGenerics_0.38.0
[7] forcats_0.5.1 stringr_1.4.0
[9] dplyr_1.0.7 purrr_0.3.4
[11] readr_1.4.0 tidyr_1.1.3
[13] tibble_3.1.2 ggplot2_3.3.4
[15] tidyverse_1.3.1 phyloseq_1.36.0
[17] dada2_1.20.0 Rcpp_1.0.6
[19] BiocManager_1.30.16
loaded via a namespace (and not attached):
[1] nlme_3.1-152 fs_1.5.0
[3] bitops_1.0-7 matrixStats_0.59.0
[5] lubridate_1.7.10 RColorBrewer_1.1-2
[7] httr_1.4.2 tools_4.1.1
[9] backports_1.2.1 utf8_1.2.1
[11] R6_2.5.0 vegan_2.5-7
[13] DBI_1.1.1 mgcv_1.8-36
[15] colorspace_2.0-1 permute_0.9-5
[17] rhdf5filters_1.4.0 ade4_1.7-17
[19] withr_2.4.2 tidyselect_1.1.1
[21] compiler_4.1.1 cli_2.5.0
[23] rvest_1.0.0 Biobase_2.52.0
[25] xml2_1.3.2 DelayedArray_0.18.0
[27] labeling_0.4.2 scales_1.1.1
[29] digest_0.6.27 Rsamtools_2.8.0
[31] jpeg_0.1-8.1 pkgconfig_2.0.3
[33] MatrixGenerics_1.4.0 dbplyr_2.1.1
[35] readxl_1.3.1 rlang_0.4.11
[37] rstudioapi_0.13 farver_2.1.0
[39] generics_0.1.0 hwriter_1.3.2
[41] jsonlite_1.7.2 BiocParallel_1.26.0
[43] RCurl_1.98-1.3 magrittr_2.0.1
[45] GenomeInfoDbData_1.2.6 biomformat_1.20.0
[47] Matrix_1.3-4 munsell_0.5.0
[49] Rhdf5lib_1.14.1 fansi_0.5.0
[51] ape_5.5 lifecycle_1.0.0
[53] stringi_1.6.2 MASS_7.3-54
[55] SummarizedExperiment_1.22.0 zlibbioc_1.38.0
[57] rhdf5_2.36.0 plyr_1.8.6
[59] grid_4.1.1 crayon_1.4.1
[61] lattice_0.20-44 haven_2.4.1
[63] splines_4.1.1 multtest_2.48.0
[65] hms_1.1.0 pillar_1.6.1
[67] igraph_1.2.6 GenomicRanges_1.44.0
[69] reshape2_1.4.4 codetools_0.2-18
[71] reprex_2.0.0 glue_1.4.2
[73] ShortRead_1.50.0 latticeExtra_0.6-29
[75] data.table_1.14.0 RcppParallel_5.1.4
[77] modelr_0.1.8 png_0.1-7
[79] vctrs_0.3.8 foreach_1.5.1
[81] cellranger_1.1.0 gtable_0.3.0
[83] assertthat_0.2.1 broom_0.7.8
[85] survival_3.2-11 iterators_1.0.13
[87] GenomicAlignments_1.28.0 cluster_2.1.2
[89] ellipsis_0.3.2
The text was updated successfully, but these errors were encountered: