-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PacBio Revio Kinnex data #1892
Comments
It does appear there are issues with
The error appears to be due to the binning. From our Nextflow output
We're working through the workflow using R+dada2 and a small data set to manually diagnose it and see if we can create a workaround. |
It looks like the issue above was a workflow problem (the task timed out). However we are seeing increased loss of sequence data compared to past runs, and there are similar issues with the binning step as seen for the NovaSeq runs interestingly enough. I suspect this might be from differences in estimated error as the problem occurs after Attached are the results from three random samples from a larger 174 sample run using aggregate-qualities.12.pdf This is the error profile: Note the dip. EDIT: I am a bit concerned about the 'flatness' of that error frequency line compared to Illumina and older PacBio data, like from @benjjneb 's past runs, e.g. https://benjjneb.github.io/LRASManuscript/LRASms_fecal.html. |
We just tried a run using the alternative function for error estimation provided by @jonalim and tested by @hhollandmoritz here: #1307 (comment) (option 4). This seems to help, though the data are pretty sparse; we're working on newer runs with more |
This is an updated plot with I should mention that the PacBio workflow steps for Kinnex still appear to use the original I should also mention these data (and our in-house data) are from the Revio, I'll update the title. |
Error profiles for the mock community samples from here: The original PacBio model,
|
Visually the fit with the updated version looks much better. Without a deep understanding, my surface read is that the updated code might be more effectively working across Q scores with very low representation in the data, and thus the improved visual fitting could be explained by a better match between the algorithm and this sort of binned-Q data. We are actually looking at PacBio Kinnex/Revio for a current project so this is very timely. I haven't ever seen real data from the platform before, but googling around I just found this: https://downloads.pacbcloud.com/public/dataset/Kinnex-16S/ |
Thx, but the original implementation was from the Illumina NovaSeq binning tests from @jonalim (which I think came from someone in his group...). That this works across binned data from different technologies is interesting on it's own.
Yep, that's part of the data I had tried here: 32-sample Zymo mock communities. Similar pattern when we ran it. I noticed they also have Kinnex/Sequel data. |
Small update: based on feedback from our PacBio BI rep it sounds like SMRTLink can be configured to generate full quality scores for Revio, though it sounds like these top out at Q40 regardless (unlike Sequel with the Q93). It requires a custom run sheet for now. Default is to bin all data. |
Thanks for updates. We have a call with PacBio about Kinnex tomorrow, and I'm going to reach out to them about their quality score approach going forward too. |
@benjjneb to add to this, we redid a Kinnex run but set it to generate the full quality range. Despite PacBio's indication these would top out at 40 they do go all the way to Q93 (using the Kinnex correction above): We're doing a second run through switching back to the standard PacBio model, which may work better in this case. |
I appreciate this thread a whole lot @cjfields ! I had also noticed the reduced quality range on our first Revio Kinnex run. Unrelated to this, our flowcell was underloaded(sic?) and the median read quality less than what it should have been, so I was confused as to what was going on. I'm actually unconvinced that the error model a lot of people have been attributing to me (with loess |
@jonalim agreed, and I think this is where having a decent mock data set would be useful to assess this. There are ones available from PacBio which we have tested and which I linked to above. Also, re: the median quality issue, is it possible the data were not filtered to 99.9% accuracy? We've seen something like this when we were mistakenly given default PacBio output (99%), which did not work. |
We're just getting in our first standard 16S data from the PacBio Revio which is using the Kinnex kit, and will soon be getting 16S-ITS-23S data. I know the FASTQ data are binned now (similar to NovaSeq I believe). Is anyone aware of issues with these? I'm more concerned about issues similar to the NovaSeq binning issues documented elsewhere:
I'm also wondering whether the kit itself (which concatenates the reads prior to sequencing, and then splits the resulting reads after sequencing based on PacBio adapters) may also affect the error profile.
The text was updated successfully, but these errors were encountered: