Documentation clarification needed for `--normalizeUsing` option in `bamCoverage` #1311

kalavattam · 2024-05-06T17:53:06Z

Pre-check

Welcome to deepTools GitHub repository! Before opening the issue please check that the following requirements are met:

Search whether this issue (or a similar issue) has been solved before using the search tab above. Link the previous issue if appropriate below.

N/A.

Paste your deepTools version (deeptools --version) and your python version (python --version) below.

❯ deepTools --version
deepTools 3.5.5

❯ python --version
Python 3.10.14

Paste the full deepTools command that produces the issue below (ignore if you simply spotted the issue in the code/documentation).

N/A; the issue pertains to documentation.

Paste the output printed on screen from the command that produces the issue below (ignore if you simply spotted the issue in the code/documentation).

N/A; the issue pertains to documentation.

Description of issue

Hello deepTools team,

I've noticed a potential inconsistency in the bamCoverage documentation regarding the --normalizeUsing option. Specifically, the descriptions for the None and RPGC settings appear to be intertwined, which could lead to confusion about the behavior of these options.

Current documentation

--normalizeUsing
Possible choices: RPKM, CPM, BPM, RPGC, None

Use one of the entered methods to normalize the number of reads per bin. By default, no normalization is performed. RPKM = Reads Per Kilobase per Million mapped reads; CPM = Counts Per Million mapped reads, same as CPM in RNA-seq; BPM = Bins Per Million mapped reads, same as TPM in RNA-seq; RPGC = reads per genomic content (1x normalization); Mapped reads are considered after blacklist filtering (if applied). RPKM (per bin) = number of reads per bin / (number of mapped reads (in millions) * bin length (kb)). CPM (per bin) = number of reads per bin / number of mapped reads (in millions). BPM (per bin) = number of reads per bin / sum of all reads per bin (in millions). RPGC (per bin) = number of reads per bin / scaling factor for 1x average coverage. None = the default and equivalent to not setting this option at all. This scaling factor, in turn, is determined from the sequencing depth: (total number of mapped reads * fragment length) / effective genome size. The scaling factor used is the inverse of the sequencing depth computed for the sample to match the 1x coverage. This option requires --effectiveGenomeSize. Each read is considered independently, if you want to only count one mate from a pair in paired-end data, then use the --samFlagInclude/--samFlagExclude options. (Default: None)

Issue

The description suggests that "None" is equivalent to not setting the normalization option, yet it immediately follows with a detailed explanation of calculating a scaling factor, which is appropriate under the RPGC option. This might incorrectly imply that None involves some form of normalization.

Suggested change

It would be clearer to move the detailed explanation about the scaling factor under the RPGC option directly, and reaffirm that None truly means no normalization. Perhaps something like this:

... RPGC (per bin) = number of reads per bin / scaling factor for 1x average coverage. This scaling factor is determined from the sequencing depth: (total number of mapped reads * fragment length) / effective genome size. The scaling factor used is the inverse of the sequencing depth computed for the sample to match the 1x coverage. This option requires --effectiveGenomeSize. None = the default and equivalent to not setting this option at all. Each read is considered independently, if you want to only count one mate from a pair in paired-end data, then use the --samFlagInclude/--samFlagExclude options. (Default: None)

This rearrangement would help clarify the documentation and ensure users correctly understand the behavior of each normalization option.

Thank you for considering this clarification to enhance the utility and user-friendliness of deepTools.

–Kris

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation clarification needed for `--normalizeUsing` option in `bamCoverage` #1311

Documentation clarification needed for `--normalizeUsing` option in `bamCoverage` #1311

kalavattam commented May 6, 2024 •

edited

Loading

Documentation clarification needed for --normalizeUsing option in bamCoverage #1311

Documentation clarification needed for --normalizeUsing option in bamCoverage #1311

Comments

kalavattam commented May 6, 2024 • edited Loading

Pre-check

Description of issue

Current documentation

Issue

Suggested change

Documentation clarification needed for `--normalizeUsing` option in `bamCoverage` #1311

Documentation clarification needed for `--normalizeUsing` option in `bamCoverage` #1311

kalavattam commented May 6, 2024 •

edited

Loading