support of aligner ngmlr #450

sherryxuePKU · 2021-08-16T05:56:13Z

Hello,
I wonder whether another long read aligner NGMLR will be supported in the future version of Bismark, becuase some result from my colleague showed that NGMLR may have lower type I error in long read mapping.

FelixKrueger · 2021-08-16T06:32:32Z

Hi @sherryxuePKU

We are currently in the process of adding minimap2 as a long read aligner which seems to be working in principle (feel free to clone the MM2 branch and give it a go). As such (and since I am completely unfamiliar with NGMLR), I am afraid there are no immediate plans to add yet another long read aligner at the current time.

sherryxuePKU · 2021-08-16T07:14:12Z

@FelixKrueger Thanks for your reply! I've used MM2 branch for a period of time on data. It went well with the alignment. But it seemed to produce incorrect methylation level of GCH with bismark_methylation_exctractor and coverage2cytosine. I used illumina bulk NOMe-seq data for benchmarking. Methylpy and scripts written by my colleagues gave the same close results, while this branch output a nearly 15% lower result. Hope this information can help improve the development of bismark_MM2~

FelixKrueger · 2021-08-16T12:33:08Z

Quick question: you say you used Illumina NOMe-seq data and experienced lower levels of methylation, i.e. accessibility, in GCH context, is that right? Illumina data doesn't really produce long reads, but I suppose it should nevertheless produce data that is comparable to data was processed with the standard pipeline (e.g. Trim Galore (clipping off the first 6-9bp), followed by non-directional single-end alignments). Did you trim the data at all when using minimap2 as the aligner? If not, can you do that (trim_galore --clip_r1 6 test_file.fastq.gz) and try again?

The latest dev versions have seen some developments mainly regarding alignment speed (>100-fold speed increase for example for PacBio reads, not sure how this would hold up for SR data), so maybe you want to make sure you are on the latest version. Cheers, Felix

sherryxuePKU · 2021-08-17T05:52:50Z

@FelixKrueger First, I'm sorry that I haven't make a clear statement. Then, let me answer your quick question. 1) I used illumina 150bp bulk NOMe-seq data for benchmark, because I assumed that bismark_methylation_extractor was read length insensitive or that any bam file produced by aligners would be processed by bismark_methylation_extractor in the same way. 2) I've check the library structures before analyze the data, and trimmed the data to get clean insets.
And it's nice of you to make some development regarding alignment speed. I'm looking forward to trying it now!

FelixKrueger · 2021-08-17T06:14:18Z

The methylation extractor is indeed read length insensitive, but it needs to have been processed with Bismark in the first place (and NOMe-seq is kind of special when it comes to trimming and mapping requirements (more here: https://github.com/FelixKrueger/Bismark/blob/master/Docs/README.md#optional-nome-seq-or-scnmt-seq)).

You can read up on the optimisation tests for minimap2 here, if you have any suggestions I'm always happy to hear them!

FelixKrueger closed this as completed Aug 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support of aligner ngmlr #450

support of aligner ngmlr #450

sherryxuePKU commented Aug 16, 2021

FelixKrueger commented Aug 16, 2021

sherryxuePKU commented Aug 16, 2021 •

edited

Loading

FelixKrueger commented Aug 16, 2021

sherryxuePKU commented Aug 17, 2021

FelixKrueger commented Aug 17, 2021

support of aligner ngmlr #450

support of aligner ngmlr #450

Comments

sherryxuePKU commented Aug 16, 2021

FelixKrueger commented Aug 16, 2021

sherryxuePKU commented Aug 16, 2021 • edited Loading

FelixKrueger commented Aug 16, 2021

sherryxuePKU commented Aug 17, 2021

FelixKrueger commented Aug 17, 2021

sherryxuePKU commented Aug 16, 2021 •

edited

Loading