Analysis of MiSeq and iSeq fastq files using DADA2 #1083

ong8181 · 2020-07-27T03:54:14Z

Hi DADA2 developers,

I have been using MiSeq so far, but recently my group bought iSeq and try to analyze iSeq sequence data by DADA2. iSeq generates basically the same outputs as MiSeq does, but I found the quality scores (Q-scores) are very different. MiSeq fastq file contains 0-39 Q-scores, but iSeq fastq file contains only three Q-scores (11, 25, 37).

DADA2 can run with the iSeq fastq files, but I am wondering whether analyzing iSeq data using DADA2 is appropriate or not. To briefly examine the effects of the different Q-scores, I have performed several analyses using my own sequence data (scripts and results are a bit long, so I posted them in my Github repository: https://github.com/ong8181/random-scripts/tree/master/04_MiSeq_vs_iSeq_DADA2)

General procedure of my test is as follows:

Partial 16S rRNA sequences were amplified using 515F-806R, and the amplicons were sequenced by MiSeq V2 250 x 2 bp kit.
Started from MiSeq fastq files (0-39).
Manually converted MiSeq Q-scores to iSeq Q-scores using a shell script.
These two types of fastq files were analyzed identically using DADA2.
Representative sequences were saved as "ASV.fa", and taxa information was assigned.
ASV table, sample information, and taxa information were imported as phyloseq objects.
Three types of visualizations were done: Barplots of MiSeq-style and iSeq-style fastq files, sequence reads of MiSeq-style v.s. iSeq-style fastq files and relative abundance of MiSeq-style v.s. iSeq-style fastq files.

I guess that, based on the results of my analysis and the algorithm of DADA2, analyzing iSeq data should be fine, but I would be glad if you could give me your thoughts on this issue.

Best regards,
Ushio

benjjneb · 2020-07-28T20:49:32Z

Wow, awesome set of analyses and Github repository, thanks for that work!

Based on what you see there, I think it confirms what I expect, which is that DADA2 will largely work OK with iSeq type quality scores. That said, there is some concern that denoising error rates might be moderately higher, in particular there might be a higher number of false-positive rare ASVs, in iSeq data. This is for two reasons, first the binned quality scores have less information which makes accurate denoising more difficult, and DADA2's error model fitting procedure was built for "normal" Miseq quality scores distributions, and can be non-ideal for binned quality scores. This has been discussed before and there is quite a bit of useful information in some other threads on this issue: #791

I do think two additional simple diagnostics could be useful, what does the output of plotErrors look like in the iSeq type data? And what is the histogram of ASV abundances in both datasets? (i.e. are there more rare ASVs in the iSeq type data?)

ong8181 · 2020-07-29T04:24:14Z

Thank you so much for your reply.

Outputs of plotErrors look like follows (these are also available at "03_SeqAnalysisDADA2_xxxOut" in the repository):

MiSeq error plot

iSeq error plot

As in #791, estimated error rates decrease sharply at around Q=30-35.

Also, I have checked histograms of ASV read counts as well as ASV relative abundance.

There is no big difference between MiSeq and (simulated) iSeq data. Slightly more rare ASVs are found in MiSeq data in terms of relative abundance (bottom panel), but this is probably because greater read counts of relatively abundance taxa derived from MiSeq data (top panel). Analyzing iSeq data with DADA2 looks fine at least when we are interested in obtaining general overview of microbial communities.

benjjneb · 2020-07-29T18:23:02Z

Analyzing iSeq data with DADA2 looks fine at least when we are interested in obtaining general overview of microbial communities.

Yeah, given the analyses you've shown here, I feel pretty good about that conclusion as well.

benjjneb closed this as completed Sep 18, 2020

benjjneb mentioned this issue Sep 18, 2020

Non-fitted error rates #1135

Closed

JacobRPrice mentioned this issue Mar 24, 2021

Binned quality scores and their effect on (non-decreasing) trans rates #1307

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analysis of MiSeq and iSeq fastq files using DADA2 #1083

Analysis of MiSeq and iSeq fastq files using DADA2 #1083

ong8181 commented Jul 27, 2020

benjjneb commented Jul 28, 2020

ong8181 commented Jul 29, 2020

benjjneb commented Jul 29, 2020

Analysis of MiSeq and iSeq fastq files using DADA2 #1083

Analysis of MiSeq and iSeq fastq files using DADA2 #1083

Comments

ong8181 commented Jul 27, 2020

benjjneb commented Jul 28, 2020

ong8181 commented Jul 29, 2020

benjjneb commented Jul 29, 2020