-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalized the DMR scores #261
Comments
Hello @ArtRand Thanks |
Hello @Congnguyenn, I apologize for the delay in getting back to you.
Looking at your first set of plots, are the low-%change/high-scoring (circled) points particularly long TSS? Did you perform 5hmC calling?
I'm working on a method that will let users "count" the number of DMRs in regions, but I don't have anything yet. Your decision function seems reasonable to me, you could also try to make a null distribution for your problem. The latter is closer to the route I've been going down. |
Thank @ArtRand for your response!
No, I set a fixed TSS size of 2000 bps, but the number of CpG sites within each TSS can vary. I only analyzed 5mC using the command To my knowledge, higher depth increases confidence in probability (due to more observations lead to smaller variance in the posterior) and does not affect the score. Only the number of CpG sites and the ratio between methylated and unmethylated sites affect the score. I also want to know if are you working on functional annotation, the tool needs to consider the methylation status (methyl or unmethyl), I believe this would be a useful feature because it helps us to interpret the different |
Hello @Congnguyenn,
It depends. Higher depth will allow the model to be more confident that the observed methylation distribution is close to the true (latent) methylation distribution. The effect of higher coverage is you'll get higher scores at smaller effect sizes (differences in methylation). If the effect size is very small or zero, increasing coverage will not increase the score. Practically speaking however, you'll probably observe higher scores in regions with higher coverage, but I don't think it makes these scores "incorrect". I could see how you might want to sub-sample the regions of high coverage to be equivalent to the regions of lower coverage, this functionality isn't available in |
Hello @ArtRand,
Thank you for your tool, It helps me a lot,
I have a couple of questions related to the DMR score when I am trying to find the DMR (TSS regions provided by a bed file) in 2 samples with a fixed size (2kbps/region). The command I used is as follows
modkit dmr pair \ -a \${methylbed[i]} \ -b \${methylbed[j]} \ --regions-bed \${filename_i}_\${filename_j}.${bedtype}.intersected \ --min-valid-coverage ${cov_cutoff} \ --ref ${reference} \ --missing quiet \ --base C \ --threads ${task.cpus} \ --header \ --log-filepath \${filename_i}_\${filename_j}.dmr.log \ --out-path \${filename_i}_\${filename_j}.dmr \ --force
I found that the score column was not well correlated with the *difference_pct_modified=abs(a_pct_modified - b_pct_modified)100 as shown in this plot. Some points (TSS) have extremely high scores (log2 scale) but the difference_pct_modified were not really high. This is a bit strange to me intuitively.
In 191, you mentioned that "the score is, unfortunately, somewhat correlated with the number of potentially modified positions (CpGs in this case I believe) in the region."
So my questions are:
The text was updated successfully, but these errors were encountered: