[help] Can Bismark handle diploid reference genome please? #697

lizhe-gis · 2024-09-06T10:07:27Z

Dear Felix,

Thank you so much for developing this great tool! I rely Bismark heavily for my research :)

Recently we developed a haplotype-resolved diploid human genome, where one copy is paternal and the other copy is maternal. I imagine if I map WGBS data to this reference genome, most reads will have secondary alignment due to the high similarity of pat-/mat-genome. I understand bowtie2 and HISAT2 are both able to randomly assign, but from reading the previous posts and the alignment flags, I understand that currently --ambig_bam will not give any methylation information.

Would it be possible to ask Bismark to randomly assign to one location and include methylation information if the two alignments have exactly the same and highest possible match please?

Thank you very much!

Best Regards,
Zhe

FelixKrueger · 2024-09-10T13:09:45Z

I am afraid there is currently no functionality to randomly assign reads to repetitive regions (which is in effect is what you have if both alleles contain the exact same sequence).

All I can think of currently is using a sequential approach where you first align the data to your haplotype-resolved diploid genome, while specifying --unmapped. Reads aligning specifically to one of the two alleles should align, while reads aligning to regions that are exactly shared will be rejected as ambiguous and end up as new 'unmapped' FastQ files.
In a second round, the unmapped reads could be aligned to only one of the copies, or maybe even the standard reference genome, and assume that there is an even split between the two alleles. This approach might suffer from a discrepancy between coordinate system, however...

lizhe-gis · 2024-09-11T08:32:15Z

Dear Felix,

Thank you very much for your kind reply!

Indeed I am also worried about the coordinate system :P Currently I can only think of mapping all the reads separately to pat- and mat- reference genome for the homologous regions to solve the coordinate problem, but the results will be haplotype-average methylation status (which is akin to the effect of random assignment, just double the coverage), I need to be cautious in result interpretation for the imprinted regions (e.g. be alert for ~50% methylated regions, but this could also be caused by cell culture heterogeneity I suppose). Similarly I am worried about the centromere region, where the sequences are highly repetitive but known to have different methylation status.

Best Regards,
Zhe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[help] Can Bismark handle diploid reference genome please? #697

[help] Can Bismark handle diploid reference genome please? #697

lizhe-gis commented Sep 6, 2024

FelixKrueger commented Sep 10, 2024

lizhe-gis commented Sep 11, 2024

[help] Can Bismark handle diploid reference genome please? #697

[help] Can Bismark handle diploid reference genome please? #697

Comments

lizhe-gis commented Sep 6, 2024

FelixKrueger commented Sep 10, 2024

lizhe-gis commented Sep 11, 2024