Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation clarification needed for --normalizeUsing option in bamCoverage #1311

Open
4 tasks done
kalavattam opened this issue May 6, 2024 · 0 comments
Open
4 tasks done

Comments

@kalavattam
Copy link

kalavattam commented May 6, 2024

Pre-check

Welcome to deepTools GitHub repository! Before opening the issue please check that the following requirements are met:

  • Search whether this issue (or a similar issue) has been solved before using the search tab above. Link the previous issue if appropriate below.

N/A.

  • Paste your deepTools version (deeptools --version) and your python version (python --version) below.
❯ deepTools --version
deepTools 3.5.5

❯ python --version
Python 3.10.14
  • Paste the full deepTools command that produces the issue below (ignore if you simply spotted the issue in the code/documentation).

N/A; the issue pertains to documentation.

  • Paste the output printed on screen from the command that produces the issue below (ignore if you simply spotted the issue in the code/documentation).

N/A; the issue pertains to documentation.



Description of issue

Hello deepTools team,

I've noticed a potential inconsistency in the bamCoverage documentation regarding the --normalizeUsing option. Specifically, the descriptions for the None and RPGC settings appear to be intertwined, which could lead to confusion about the behavior of these options.

Current documentation

--normalizeUsing
Possible choices: RPKM, CPM, BPM, RPGC, None

Use one of the entered methods to normalize the number of reads per bin. By default, no normalization is performed. RPKM = Reads Per Kilobase per Million mapped reads; CPM = Counts Per Million mapped reads, same as CPM in RNA-seq; BPM = Bins Per Million mapped reads, same as TPM in RNA-seq; RPGC = reads per genomic content (1x normalization); Mapped reads are considered after blacklist filtering (if applied). RPKM (per bin) = number of reads per bin / (number of mapped reads (in millions) * bin length (kb)). CPM (per bin) = number of reads per bin / number of mapped reads (in millions). BPM (per bin) = number of reads per bin / sum of all reads per bin (in millions). RPGC (per bin) = number of reads per bin / scaling factor for 1x average coverage. None = the default and equivalent to not setting this option at all. This scaling factor, in turn, is determined from the sequencing depth: (total number of mapped reads * fragment length) / effective genome size. The scaling factor used is the inverse of the sequencing depth computed for the sample to match the 1x coverage. This option requires --effectiveGenomeSize. Each read is considered independently, if you want to only count one mate from a pair in paired-end data, then use the --samFlagInclude/--samFlagExclude options. (Default: None)

Issue

The description suggests that "None" is equivalent to not setting the normalization option, yet it immediately follows with a detailed explanation of calculating a scaling factor, which is appropriate under the RPGC option. This might incorrectly imply that None involves some form of normalization.

Suggested change

It would be clearer to move the detailed explanation about the scaling factor under the RPGC option directly, and reaffirm that None truly means no normalization. Perhaps something like this:

... RPGC (per bin) = number of reads per bin / scaling factor for 1x average coverage. This scaling factor is determined from the sequencing depth: (total number of mapped reads * fragment length) / effective genome size. The scaling factor used is the inverse of the sequencing depth computed for the sample to match the 1x coverage. This option requires --effectiveGenomeSize. None = the default and equivalent to not setting this option at all. Each read is considered independently, if you want to only count one mate from a pair in paired-end data, then use the --samFlagInclude/--samFlagExclude options. (Default: None)

This rearrangement would help clarify the documentation and ensure users correctly understand the behavior of each normalization option.

Thank you for considering this clarification to enhance the utility and user-friendliness of deepTools.

–Kris

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant