-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bogart failure in Canu 2.1 - error with AS_BAT_MarkRepeatReads #1806
Comments
Can you share the overlaps (unitigging/.ovlStore) and read metadata (.seqStore, excluding the 'blobs' files)? The sequence data is stored in the 'blobs' files, and I don't need that to debug. Upload directions are in the FAQ. |
Sure thing. I just sent over the contents of |
Great! Thanks! Shouldn't be too hard to fix now, but I won't get to it for another 16 hours. Unfortunately, the whole ovlStore is needed. |
I think I have found the problem on my end. These HiFi reads were generated from a new prep type and require trimming before analysis. The adapter occurs on 99% of the reads used for this failed assembly. This would explain why the error only occurred with this particular sample, and not the other two. This is likely a bizarre edge-case for Canu, and I wouldn't expect the assembly from these untrimmed reads to be usable. You may want to close this issue unless you think the error is still worth exploring. |
Wonderful! You can use "-untrimmed -pacbio-hifi" to enable trimming of hifi reads. It's still a good bug. I've found what looks like the problem, but don't know the correct way to fix it yet. |
Looking at the high-quality overlaps confirms your explanation. This command will show the overlap picture for the first 100 reads.
Here are the overlaps for read 30:
No overlaps off the 5' end of read 30, and most overlaps involve the 3' end of the other read. The first line describes the read we're showing overlaps for, the other lines are the overlapping reads. |
Many thanks for the data. Fixed! |
That makes total sense. Thanks for digging into this. I re-ran the assembly after the trimming and de-duplication steps (this library actually involves PCR amplification), and the assembly looks really good. Thanks for all your work on Canu! |
Hello,
I am performing a metagenome assembly with PacBio HiFi data and have run into a repeatable error. I am using canu 2.1 with:
canu -d STD -p Zymo6331-STD -pacbio-hifi Zymo6331-STD.fastq genomeSize=100m maxInputCoverage=1000 batMemory=200
I am running on SGE with the same configuration I have used to run other assemblies successfully (
gridEngineResourceOption="-pe smp THREADS -l mem_free=MEMORY"
andgridOptions="-V -S /bin/bash -q bigmem"
).I am working with three samples, each run using the same set of arguments as above (but with different
-d
,-p
, and fastq names). Two finished without any issues. For one sample, the run eventually fails. I scanned the error message and it suggested trying to run it again. I did, and it produced the same error again. It suggests the issue occurs when running Bogart:Any suggestions as to what might be going wrong here? I have attached the unitigger.err file here too (unitigger.err.txt), in case that helps.
Thanks!
Dan
The text was updated successfully, but these errors were encountered: