Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support of aligner ngmlr #450

Closed
sherryxuePKU opened this issue Aug 16, 2021 · 5 comments
Closed

support of aligner ngmlr #450

sherryxuePKU opened this issue Aug 16, 2021 · 5 comments

Comments

@sherryxuePKU
Copy link

Hello,
I wonder whether another long read aligner NGMLR will be supported in the future version of Bismark, becuase some result from my colleague showed that NGMLR may have lower type I error in long read mapping.

@FelixKrueger
Copy link
Owner

Hi @sherryxuePKU

We are currently in the process of adding minimap2 as a long read aligner which seems to be working in principle (feel free to clone the MM2 branch and give it a go). As such (and since I am completely unfamiliar with NGMLR), I am afraid there are no immediate plans to add yet another long read aligner at the current time.

@sherryxuePKU
Copy link
Author

sherryxuePKU commented Aug 16, 2021

@FelixKrueger Thanks for your reply! I've used MM2 branch for a period of time on data. It went well with the alignment. But it seemed to produce incorrect methylation level of GCH with bismark_methylation_exctractor and coverage2cytosine. I used illumina bulk NOMe-seq data for benchmarking. Methylpy and scripts written by my colleagues gave the same close results, while this branch output a nearly 15% lower result. Hope this information can help improve the development of bismark_MM2~

@FelixKrueger
Copy link
Owner

Quick question: you say you used Illumina NOMe-seq data and experienced lower levels of methylation, i.e. accessibility, in GCH context, is that right? Illumina data doesn't really produce long reads, but I suppose it should nevertheless produce data that is comparable to data was processed with the standard pipeline (e.g. Trim Galore (clipping off the first 6-9bp), followed by non-directional single-end alignments). Did you trim the data at all when using minimap2 as the aligner? If not, can you do that (trim_galore --clip_r1 6 test_file.fastq.gz) and try again?

The latest dev versions have seen some developments mainly regarding alignment speed (>100-fold speed increase for example for PacBio reads, not sure how this would hold up for SR data), so maybe you want to make sure you are on the latest version. Cheers, Felix

@sherryxuePKU
Copy link
Author

@FelixKrueger First, I'm sorry that I haven't make a clear statement. Then, let me answer your quick question. 1) I used illumina 150bp bulk NOMe-seq data for benchmark, because I assumed that bismark_methylation_extractor was read length insensitive or that any bam file produced by aligners would be processed by bismark_methylation_extractor in the same way. 2) I've check the library structures before analyze the data, and trimmed the data to get clean insets.
And it's nice of you to make some development regarding alignment speed. I'm looking forward to trying it now!

@FelixKrueger
Copy link
Owner

The methylation extractor is indeed read length insensitive, but it needs to have been processed with Bismark in the first place (and NOMe-seq is kind of special when it comes to trimming and mapping requirements (more here: https://github.com/FelixKrueger/Bismark/blob/master/Docs/README.md#optional-nome-seq-or-scnmt-seq)).

You can read up on the optimisation tests for minimap2 here, if you have any suggestions I'm always happy to hear them!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants