Skip to content

AddNorm

nchernia edited this page Dec 3, 2018 · 9 revisions

Juicer automatically calls addNorm as a part of Pre. In case you have specifically not calculated norms by using the -n flag when calling Pre, you may use the addNorm method to normalize. Additionally, you can submit your own vector to be stored with the hic file. Currently we only allow one normalization vector to be added in this manner; it will appear in the dropdown menu in Juicebox as "LOADED".

Add Norm

If you create a .hic file without normalization vectors (using the -n flag) or if you want to apply genome-wide normalization or not normalize the fragment matrices, you can use the addNorm command. This will add all normalization vectors (coverage, square root coverage, and balanced)

  addNorm <input_HiC_file> [-w genome-wide-resolution] -F

Required:

  • <input_HiC_file>: File to normalize; this will delete any previous normalizations Optional:
  • -w <genome-wide resolution>: Smallest resolution to calculate genome-wide resolution; e.g., if 10000, genome-wide normalizations will be calculated for 2.5Mb, 1Mb, 500Kb, 250Kb, 100Kb, 50Kb, 25Kb, and 10Kb but not for 5Kb. Note that genome-wide resolution can be very expensive in terms of memory; this flags allows for a memory/normalization trade-off. If not set or set to 0, no genome-wide resolutions will be calculated
  • -F: Do not calculate normalizations for fragment-delimited matrices
  • -d: For genome-wide normalization, include intra-chromosomal matrices; by default, inter-only matrices are used.

Add an additional vector as the norm (BETA)

We also allow the ability to send in a text file that contains an additional precalculated normalization vector. Currently we only allow one such vector per hic file. It will appear in the normalization drop down menu in Juicebox as "LOADED".

Usage

  addNorm <input_HiC_file> <input_normalization_vector>

File format for normalization vector

The file format for the normalization vector is a simple text file. For each new chromosome-resolution combination, there should be a new header line starting with the word "vector". Next comes the name (ignored for now, but will be used in future versions), the chromosome, and the resolution.

For example:

vector  MyNewNorm  chr1    2048000 BP
0.317522718673
0.72654265741
0.29353335424
0.638594927778
0.778175426373
0.322731736798
.
0.866251628293
0.170958650245
0.33980309017
0.968321393859
0.20895742749
.
vector  MyNewNorm  chr2    2048000 BP
0.373096503455
0.352043644426
0.482002469483
0.222967309304
0.35294886705
0.732542404884
0.859455987572
0.696335078534
0.033424729077
0.743690271992
0.012562851558
0.179414542021

There should be the same number of entries in the norm vector as the size of the chromosome divided by the bin size. E.g. for chromosome 1 in hg19 with 10K resolution, there should be 249250621/10000 entries (so 24925).