You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is not really a bug as the documentation clearly states how deduplicate_bismark expects UMIs to be handled, but it is an easy mistake to make.
As documented in deduplicate_bismark, Bismark expects UMIs of the form: @A00001:001:HN2F7DRX1:1:1101:1452:1000 1:N:0:AATGACGC:CAAGAG
But if Illuminas bcl-convert is used with OverrideCycles to handle UMIs, the read ID looks like this @A00001:001:HN2F7DRX1:1:1101:1452:1000:CAAGAG 1:N:0:AATGACGC
The UMI is highlighted in bold.
This means the sample index is used as a UMI, and no warning or error is emitted.
I propose running a pre-flight check to detect this scenario, and potentially to support the UMI location chosen by Illumina.
EDIT: I might have been completely off. I'll close it for now.
The text was updated successfully, but these errors were encountered:
I double checked and this is an issue. Tools like Illuminas bcl-convert and umi-tools places the UMI like this (umi-tools by default uses _ instead of : for separation): @A00001:001:HN2F7DRX1:1:1101:1452:1000:CAAGAG 1:N:0:AATGACGC
Normally bowtie2 (and other aligners it seems) drops everything after the space, so the corresponding sam record ID would be:
A00001:001:HN2F7DRX1:1:1101:1452:1000:CAAGAG
replaces spaces with underscores so the sam record ID is
A00001:001:HN2F7DRX1:1:1101:1452:1000:CAAGAG_1:N:0:AATGACGC
This causes index to be treated as the UMI. No warning or error is given (either by deduplicate_bismark or umi-tools dedup) and the estimated number of duplicates is massively inflated.
This is extra problematic as this means the workflow
This is not really a bug as the documentation clearly states how deduplicate_bismark expects UMIs to be handled, but it is an easy mistake to make.
As documented in deduplicate_bismark, Bismark expects UMIs of the form:
@A00001:001:HN2F7DRX1:1:1101:1452:1000 1:N:0:AATGACGC:CAAGAG
But if Illuminas bcl-convert is used with OverrideCycles to handle UMIs, the read ID looks like this
@A00001:001:HN2F7DRX1:1:1101:1452:1000:CAAGAG 1:N:0:AATGACGC
The UMI is highlighted in bold.
This means the sample index is used as a UMI, and no warning or error is emitted.
I propose running a pre-flight check to detect this scenario, and potentially to support the UMI location chosen by Illumina.
EDIT: I might have been completely off. I'll close it for now.
The text was updated successfully, but these errors were encountered: