Does pileup always output a value for a covered base? #200

Ge0rges · 2024-06-06T22:45:55Z

Hi,

Just a simple question (for once), I wanted to ensure I understand the output of pileup correctly. For any given location/nucleotide, if there is no methylation but there is coverage of that base, will N canonical always be non-zero?

The text was updated successfully, but these errors were encountered:

ArtRand · 2024-06-07T03:56:14Z

Hello @Ge0rges,

Any position with valid coverage >0 will have a bedMethyl record.

For any given location/nucleotide, if there is no methylation but there is coverage of that base, will N canonical always be non-zero?

I would say "For any given location/nucleotide, if there is no methylation but there is coverage of that base and the modification call passes the confidence threshold, will N canonical always be non-zero?"

Let me know if you find this to not be the case.

Ge0rges · 2024-06-07T06:21:51Z

So just to double check, even if a base is not methylated but covered it could not show up if the modification call isn't above the threshold?

Would it be incorrect to treat covered bases below the threshold as unmethylated? I guess unknown is different that unmethylated...

ArtRand · 2024-06-07T16:33:46Z

@Ge0rges

A potential confusion is that modkit does not assume a base is canonical if the probability of modification is close to $\frac{1}{N_{\text{classes}}}$. A modification call can be canonical or any of the potential modification codes/states. So if a base is covered, but the call is canonical, there is a confidence/probability associated with that canonical call. If that canonical probability is below the pass threshold, it will be filtered out.

For example (take a simple 5mC/canonical example):

$$ P_{\text{5mC}} = 0.7, P_{\text{5hmC}} = 0.2, P_{\text{canonical}} = 1 - P_{\text{5mC}} + P_{\text{5hmC}} = 0.1, \text{Call} \Rightarrow \text{5mC} $$

However if the threshold is 0.9, then this call will fail not be considered canonical. There are more examples in the documentation.

If a read is filtered out, it adds a count to column 16 not $N_{\text{valid cov}}$. If a row has $N_{\text{valid cov}}$ of zero, it's not included in the output. You can see this easily when you run modkit pileup with --no-filtering and --filter-threshold 0.95 (or higher) and count the rows.

You can, however, force a default to canonical call by setting the thresholds specifically for each modified base --mod-threshold h:0.95 --mod-threshold m:0.95 --filter-threshold C:0.05 in this case what happens is the canonical probability (0.1) is still above the requirement for the base (0.05) and a canonical call is emitted. Using the thresholds this way is only advised when you have a very strong prior that bases should be canonical unless there is strong evidence of modification.

ArtRand added the question Looking for clarification on inputs and/or outputs label Jun 7, 2024

Ge0rges closed this as completed Jun 9, 2024

ArtRand mentioned this issue Jun 27, 2024

Problem interpretating table #217

Open

ArtRand mentioned this issue Jul 19, 2024

modkit results look off for CHG and CHH contexts - plant methylation #234

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does pileup always output a value for a covered base? #200

Does pileup always output a value for a covered base? #200

Ge0rges commented Jun 6, 2024

ArtRand commented Jun 7, 2024

Ge0rges commented Jun 7, 2024

ArtRand commented Jun 7, 2024

Does pileup always output a value for a covered base? #200

Does pileup always output a value for a covered base? #200

Comments

Ge0rges commented Jun 6, 2024

ArtRand commented Jun 7, 2024

Ge0rges commented Jun 7, 2024

ArtRand commented Jun 7, 2024