-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multicore Bismark could corrupt unmapped/ambigious reads FASTQ files for unknown reason #495
Comments
Bismark process logs & snakemake wrapper process logs |
P.S: After re-run i get:
So the error was only during |
I am very sorry for the slow reply, I will try to look at your questions tomorrow morning. Hope that's still OK? |
Hmm, this one is probably tricky. My initial thought would also be that this might be caused by buffering issues. I can try to run some tests with a similar command, but chances are that I won't be able to replicate the issue. If it was true that this is a buffering issue, I suppose one would fine these 'corruption events' at the intersection of from one temp.fq file to another? But this is quite weird as it really is simply merging uncompressed text files... I'll do some tests and will report back. |
I've processed 200 WGBS without any issues. Maybe it is a very rare event.
The launch where I got all problems was in conditions when the cluster likely was short of HDD space. I don't know the exact sequence of events, but the amount of free space was bouncing near zero. The fact that FASTQ lines number is almost normal (
On the other hand is quite weird that same incorrect line number was in both in |
I think I might have found the culprit(s), it appears that there never were any explicit close statements for ambiguous or unmapped FastQ files, which is not noticeable in non-multicore runs as the filehandles get closed automatically when Bismark exits (so I would imagine that the files of the child processes would be complete), but I suppose the parent process might occasionally - but not every time - still have a few lines held in buffer rather than having been written out.... So yes, this almost sounds a little corner-case-y (but it shouldn't happen nevertheless). I have now added explicit closing statements for the unmapped and ambiguous filehandles (9d7a806), I hope this should solve the issue?! Thanks for finding this, would you mind cloning the current dev branch and testing whether it is now resolved? Best wishes! Felix |
Thx, for the explanation and fix, it sounds reasonable. Is |
I think the dev branch is pretty much equivalent to master branch, but we added support for minimap2 for Nanopore and PacBio alignments, there will most likely be a new release very soon (it has just been presented at AGBT last week). I was just wanting to have development on the dev branch only, and merge into master for releases (which I should probably have done from the start :P). |
This should be a solved issue. |
This issue differs from #494 because I don't have any errors in the log file.
Everything seems OK, reasonable number of reads in BAM files, but FAST.GZ file has odd lines number (
FJ02_hg38_unmapped_reads_1.fq.gz
: 590731999 lines,FJ02_hg38_unmapped_reads_2.fq.gz
590731999 lines). Obviously, the number of lines in FAST.GZ cannot be odd and the FASTQ.GZ file was corrupted.
P.S: Maybe there is some issue with buffering while saving temp files.
Bismark (0.23.0)
I launched command:
bismark -X 600 --gzip --multicore 4 --ambiguous --unmapped --bowtie2 $(dirname resources/indexes/hg38/Bisulfite_Genome) -1 data/reads/wgbs_y23_pooled/Clean/FJ02/FJ02_1.fq.gz -2 data/reads/wgbs_y23_pooled/Clean/FJ02/FJ02_2.fq.gz
It looks like one of the files was truncated, although no errors in the log file. On the example below you can notice that new read
@V35..
starts in the end of QC string of previous read. And the QC string is shorter than expected.The file
tail
looks ok. So likely one of the temp files was truncated during merge:P.S: I'm going to re-align this sample and will known in several days if the problem persists or not.
The text was updated successfully, but these errors were encountered: