MBias plots #473

LandiMi2 · 2022-01-20T07:12:25Z

Hey, @FelixKrueger I have been having a challenge understanding why R2 read from Illumina library has biases on methylation calls and how to correct them. I understand you can ignore a few bases 5' or 3' but in cases
. This read quality was okay (if the quality was poor then, that could be a possible reason). I don't really know how to correct it. Is it even relevant to correct these graphs to at least look like
- (Read 1) or just ignore and carry on with the downstream analysis. What would be the impact on the downstream analysis?
I have looked for literature explaining the cause of these biases but I have found none. Please comment.

FelixKrueger · 2022-01-20T09:56:05Z

If you see dramatic biases the M-bias plot they are typically indicative of either technical issues or a consequence of the type of library preparation and/or procedure. As long as these methylation values do not reflect true methylation values they introduce spurios methylation calls, and thus introduce noise. Arguably, if you are looking for very strong effects you might get away with a bit more noise in the system, but ideally you would want to start your downstream analysis with as clean data as possible (that is at least my opinion).

Sometimes you may end up with fairly easy-to-remedy technical artefacts, such as end repair fill-in biases (https://sequencing.qcfail.com/articles/library-end-repair-reaction-introduces-methylation-biases-in-paired-end-pe-bisulfite-seq-applications/), which can be simply be corrected by using --ingore 3 or similar.

Some other techniques or kits used introduce biases, e.g. PBAT, single-cell applications, Zymoe Pico-methyl, Accel Swift to name just a few, introduce their own biases (see e.g. here: https://sequencing.qcfail.com/articles/mispriming-in-pbat-libraries-causes-methylation-bias-and-poor-mapping-efficiencies/ or here for recommendations for trimming: https://github.com/FelixKrueger/Bismark/tree/master/Docs#ix-notes-about-different-library-types-and-commercial-kits).

In your specific case, Read 1 looks like one you would hope to get (assuming this is a plant species?). Read 2 certainly has a somewhat spiky methylation pattern over the first 8-10bp (?) which quite clearly is much lower than for the rest of the read. Whether you want to hard-clip the reads (e.g. with Trim Galore --clip_r2 10, maybe this would also improve the alignment rate?) or simply ignore these residues within the methylation extractor is kind of your choice. IF you look at the number of actual methylation calls performed you will see that over the first ~10bp you have a fairly high number of calls compared to the more 3' end of Read 2 (which is a consequence of overlap detection and removal that is expected), so your Read 2 calls will contribute a comparatively high number of biased (and potentially spurious) calls.

I would be somewhat more alarmed by the fact that your Read 1 methylation are around 30/15/3 % in CpG/CHG/CHH context, and 45/25/10% for Read 2. Arguably that difference is much bigger than the biases observed at the 5' end of Read 2. The easiest explanation for this would be that the reads do not belong to the same sample - which would be great. If they are from the same sample, you would be in the awkward position to decide how to proceed - do you want to just use R1, or just R2, or simply use both and see what you get? You could also go back to the sequencing facility to see if something appeared weird, check which kind of sequencer your data was on (overcalling of Gs for Read 2?) etc. But that is kind of yet another question...

LandiMi2 · 2022-01-24T04:43:55Z

Thanks, @FelixKrueger for your response. I guess in my case I will proceed with the analysis with only R1. I understand I lose on the coverage. These sequences were done a long time ago, so tracking down where the problem was in the library is a bit tricky. Yes, these are sequences from a plant species.

shaohuaihan · 2024-07-22T05:59:27Z

Should the total calls line for Read 2 also be smooth ？

FelixKrueger · 2024-07-22T19:11:42Z

Duplicate post (see #673).

FelixKrueger closed this as completed May 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MBias plots #473

MBias plots #473

LandiMi2 commented Jan 20, 2022

FelixKrueger commented Jan 20, 2022

LandiMi2 commented Jan 24, 2022

shaohuaihan commented Jul 22, 2024

FelixKrueger commented Jul 22, 2024

MBias plots #473

MBias plots #473

Comments

LandiMi2 commented Jan 20, 2022

FelixKrueger commented Jan 20, 2022

LandiMi2 commented Jan 24, 2022

shaohuaihan commented Jul 22, 2024

FelixKrueger commented Jul 22, 2024