Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

m6A methylation analysis and explanation of output files #276

Open
JhinJhinJhin opened this issue Oct 8, 2024 · 1 comment
Open

m6A methylation analysis and explanation of output files #276

JhinJhinJhin opened this issue Oct 8, 2024 · 1 comment
Labels
question Looking for clarification on inputs and/or outputs

Comments

@JhinJhinJhin
Copy link

methylation distribution on read

I want to analyze the RNA m6A modification. I selected m6A_DRACH model when dorado basecaller.
Then I use modkit for downstream analysis.
modkit pileup $bam \ ${id}_pileup.bed \ --ref $ref \ -t 12 --log-filepath ${id}_pileup.log

bgzip -@ 4 -k $bed
tabix -@ 4 -p bed ${bed}.gz
modkit motif bed DM8.1.fa DRACH 2 1> DM8_DRACH.bed
modkit extract calls -t 8 $bam extract_calls/SDL-1_ref_calls.tsv --ref $genome --include-bed DM8_DRACH.bed
I use modkit extract calls command to generate a table of read-level base modification calls.

read_id forward_read_position ref_position chrom mod_strand ref_strand ref_mod_strand fw_soft_clipped_start fw_soft_clipped_end read_length call_prob call_code base_qual ref_kmer query_kmer canonical_base modified_primary_base fail inferred within_alignment flag
ad2e3781-d1bf-4457-a6e5-e48638151a51 1177 901581 chr01 + - - 0 87 1299 0.9941406 - 24 TGTTT AAACA A A false false true 16
ad2e3781-d1bf-4457-a6e5-e48638151a51 1082 901676 chr01 + - - 0 87 1299 0.9980469 - 28 TGTCA TGACA A A false false true 16
ad2e3781-d1bf-4457-a6e5-e48638151a51 1008 901750 chr01 + - - 0 87 1299 0.9980469 - 23 GGTTT AAACC A A false false true 16
ad2e3781-d1bf-4457-a6e5-e48638151a51 955 901803 chr01 + - - 0 87 1299 0.9433594 - 12 TGTTC GAACA A A true false true 16
ad2e3781-d1bf-4457-a6e5-e48638151a51 887 901881 chr01 + - - 0 87 1299 0.9980469 - 17 TGTCA TGACA A A false false true 16
ad2e3781-d1bf-4457-a6e5-e48638151a51 878 901890 chr01 + - - 0 87 1299 0.9980469 - 23 AGTTT AAACT A A false false true 16
ad2e3781-d1bf-4457-a6e5-e48638151a51 809 901960 chr01 + - - 0 87 1299 0.9980469 - 20 AGTCC GGACT A A false false true 16
ad2e3781-d1bf-4457-a6e5-e48638151a51 786 901983 chr01 + - - 0 87 1299 0.8808594 - 19 AGTCA TGACT A A true false true 16
c90cf7b9-674f-46d6-bfa4-936058b6ea0e 126 971421 chr01 + - - 0 68 1258 0.8222656 a 23 TGTTA TAACA A A true false true 16
c90cf7b9-674f-46d6-bfa4-936058b6ea0e 54 972330 chr01 + - - 0 68 1258 0.9980469 - 15 AGTTT AAACT A A false false true 16
c90cf7b9-674f-46d6-bfa4-936058b6ea0e 2 972382 chr01 + - - 0 68 1258 0.9980469 - 19 TGTCA TGACA A A false false true 16
6ade67ef-11c0-4ea6-8755-cbc0da1a4f1f 253 901581 chr01 + - - 2 80 340 0.9980469 - 19 TGTTT AAACA A A false false true 16
6ade67ef-11c0-4ea6-8755-cbc0da1a4f1f 176 970230 chr01 + - - 2 80 340 0.9980469 - 30 GGTCT AGACC A A false false true 16
6ade67ef-11c0-4ea6-8755-cbc0da1a4f1f 163 970243 chr01 + - - 2 80 340 0.9902344 - 31 TGTTA TAACA A A false false true 16
6ade67ef-11c0-4ea6-8755-cbc0da1a4f1f 89 970317 chr01 + - - 2 80 340 0.9980469 - 17 GGTTT AAACC A A false false true 16
6ade67ef-11c0-4ea6-8755-cbc0da1a4f1f 36 970370 chr01 + - - 2 80 340 0.9980469 - 27 TGTTC GAACA A A false false true 16

I would like to know what conditions satisfy the site that can be considered as a modified site on the read. Is it when 'fail' and 'inferred' are both false? And I’m not quite sure what ‘implicit canonical’ means either.

Detecting differential modification at single base positions

modkit dmr pair
-a SDL-1_pileup.bed.gz
-a SDT-1_pileup.bed.gz
-b SDL-2_pileup.bed.gz
-b SDT-1_pileup.bed.gz
-o SLSD_dmr/SLSD
--ref $genome
--base A
-t 8
-f
--log-filepath SLSD_dmr/dmr_multi.log

SLSD:

chr01 428377 428378 . -0.2595111954850844 a:0 1 a:0 1 a:0.00 a:0.00 0 0 1 0 1 0 50 50 1 0
chr01 428819 428820 . 0.25131442828090655 a:0 1 a:1 2 a:0.00 a:50.00 0 0.5 0.9321833846700683 -0.5 0.9321833846700683 -0.5 50 100 - -
chr01 428876 428877 . -0.29257296267155475 a:3 4 a:3 4 a:75.00 a:75.00 0.75 0.75 1 0 1 0 100 100 1,1 0,0
chr01 428877 428878 . -0.2595111954850844 a:1 1 a:1 1 a:100.00 a:100.00 1 1 1 0 1 0 50 50 1 0
chr01 428912 428913 . -0.3006875128982047 a:0 2 a:0 2 a:0.00 a:0.00 0 0 1 0 1 0 100 100 1,1 0,0
chr01 428921 428922 . -0.2595111954850844 a:0 1 a:0 1 a:0.00 a:0.00 0 0 1 0 1 0 50 50 1 0
chr01 428926 428927 . -0.27243263335211054 a:18 26 a:25 34 a:69.23 a:73.53 0.6923077 0.7352941 1 -0.042986425339366585 1 0.015837104072398134 100 100 1,1 -0.04029304029304026,0
chr01 428934 428935 . -0.33202943912385763 a:0 10 a:0 5 a:0.00 a:0.00 0 0 1 0 1 0 100 100 1,1 0,0
chr01 428944 428945 . 0.3431471740471608 a:45 45 a:45 46 a:100.00 a:97.83 1 0.9782609 1 0.021739130434782594 1 0.021739130434782594 100 100 1,1 0.0357142857142857,0
chr01 428951 428952 . 1.1360555327687223 a:11 36 a:5 36 a:30.56 a:13.89 0.30555555 0.1388889 0.1987868090862803 0.16666666666666669 0.17709312475070266 0.16666666666666669 100 100 0.10617958219713411,1 0.3000000chr01 429017 429018 . -0.2769449125416088 a:49 63 a:45 60 a:77.78 a:75.00 0.7777778 0.75 1 0.02777777777777779 1 0.04086021505376347 100 100 0.8874779988874301,1 0.07027649769585254,0
chr01 429027 429028 . -0.20933341083340906 a:37 54 a:38 52 a:68.52 a:73.08 0.6851852 0.7307692 1 -0.045584045584045496 1 -0.026353276353276334 100 100 0.781323132265617,1 -0.08333333333333337,0
chr01 429028 429029 . 0.24018299513881303 a:11 14 a:11 18 a:78.57 a:61.11 0.78571427 0.6111111 0.5489948392453672 0.17460317460317454 0.8185923129030523 0.10317460317460314 100 100 0.3876206189317763,1 0.3555555chr01 429029 429030 . 0.25131442828090655 a:0 1 a:1 2 a:0.00 a:50.00 0 0.5 0.9321833846700683 -0.5 0.9321833846700683 -0.5 50 100 - -
chr01 429077 429078 . -0.3295915950843096 a:44 61 a:47 64 a:72.13 a:73.44 0.72131145 0.734375 1 -0.01306352459016391 1 -0.017708333333333326 100 100 1,1 -0.025252525252525193,0
chr01 429078 429079 . -0.2776317365982791 a:0 1 a:0 2 a:0.00 a:0.00 0 0 1 0 1 0 50 100 - -
chr01 429192 429193 . -0.2584535893156925 a:14 52 a:15 49 a:26.92 a:30.61 0.26923078 0.30612245 1 -0.036891679748822626 1 -0.04326923076923078 100 100 0.8664138256652923,1 -0.10287081339712917,0
chr01 429196 429197 . 0.2908615565132848 a:5 16 a:2 14 a:31.25 a:14.29 0.3125 0.14285715 0.5291319892925755 0.16964285714285715 0.403690372946813 0.17857142857142858 100 100 0.21078940602940324,1 0.428571428571428chr01 429201 429202 . -0.3076466561820299 a:0 3 a:0 2 a:0.00 a:0.00 0 0 1 0 1 0 100 50 - -
chr01 429207 429208 . -0.3101721989665336 a:1 31 a:1 27 a:3.23 a:3.70 0.032258064 0.037037037 1 -0.004778972520908004 1 -0.005128205128205131 100 100 1,1 -0.024242424242424246,0
chr01 429218 429219 . -0.2994907503308539 a:24 27 a:30 33 a:88.89 a:90.91 0.8888889 0.90909094 1 -0.02020202020202022 1 0.016826923076923128 100 100 1,1 0,0
chr01 429223 429224 . -0.26578015463795346 a:1 15 a:1 10 a:6.67 a:10.00 0.06666667 0.1 1 -0.03333333333333334 1 -0.02857142857142858 100 100 1,1 0,0
chr01 429224 429225 . -0.3006875128982047 a:0 2 a:0 2 a:0.00 a:0.00 0 0 1 0 1 0 100 100 1,1 0,0
chr01 429229 429230 . 0.6238435756252585 a:0 17 a:1 11 a:0.00 a:9.09 0 0.09090909 1 -0.09090909090909091 1 -0.1 100 100 0.5762151737229898,1 -0.3333333333333333,0
chr01 429234 429235 . -0.1418632197203351 a:3 6 a:4 6 a:50.00 a:66.67 0.5 0.6666667 0.8376587524553095 -0.16666666666666663 0.837695883342009 -0.16666666666666669 100 100 0.7710310269619939,1 -0.25,0
chr01 429239 429240 . 0.0001373942800455552 a:678 701 a:700 730 a:96.72 a:95.89 0.9671897 0.9589041 1 0.0102679011129716 1 0.0102679011129716 100 100 1,1 0.020855904658721558,-0.00029547916871863755
chr01 429250 429251 . -0.27925811985412086 a:4 5 a:3 4 a:80.00 a:75.00 0.8 0.75 1 0.050000000000000044 0.7710310269619939 0.25 100 100 0.9321833846700683,1 0.5,0
chr01 429253 429254 . -0.2595111954850844 a:0 1 a:0 1 a:0.00 a:0.00 0 0 1 0 1 0 50 50 1 0
chr01 429264 429265 . 0.25131442828090655 a:1 2 a:0 1 a:50.00 a:0.00 0.5 0 0.9321833846700683 0.5 0.9321833846700683 0.5 50 50 0.9321833846700683 0.5
chr01 429330 429331 . -0.18012700383224 a:479 656 a:497 694 a:73.02 a:71.61 0.73018295 0.7161383 1 0.012090022653402976 1 0.022678026199152934 100 100 1,1 0.043030674504778044,0.001280409731113985

I don’t know which sites can be considered as sites with significant differential methylation . Can you give me an example?

@ArtRand ArtRand added the question Looking for clarification on inputs and/or outputs label Oct 9, 2024
@ArtRand
Copy link
Contributor

ArtRand commented Oct 9, 2024

Hello @JhinJhinJhin,

I apologize for the slow response, I've been OOO.

Your commands for pileup, extract calls, and motif bed look good to me.

I would like to know what conditions satisfy the site that can be considered as a modified site on the read. Is it when 'fail' and 'inferred' are both false? And I’m not quite sure what ‘implicit canonical’ means either.

When you use modkit extract calls the call_code column indicates the predicted base modification status for each position with base modification probabilities in each read. The code is specified in the modBAM (by the base modification model). In the case of m6A, a will be m6A and - will be canonical. As per the documentation the fail column indicates if the base modification call probability is below the pass threshold. The inferred column only means whether or not a canonical call was assumed to be canonical due to the absence of a base modification probability, see the SAM specification, page 7 for details. For your application, using the DRACH model, you don't need to worry about "implicit canonical" since any of the motif models (DRACH, CpG) should not emit these kinds of tags.

I don’t know which sites can be considered as sites with significant differential methylation . Can you give me an example?

As far as the scoring, please refer to the documentation on the models, for your application I suggest looking at the MAP-based p-value. You can sort the results either by the score column or the MAP-based p-value column (find the schema in the documentation). The next step depends on your biological question, you could, for example, look for transcripts with the most differentially methylated sites. You may also be interested in the entropy command. You could calculate the entropy on each transcript and then look at the differences between the two samples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Looking for clarification on inputs and/or outputs
Projects
None yet
Development

No branches or pull requests

2 participants