Skip to content

Background_generation

Vivekanandan Ramalingam edited this page May 11, 2024 · 3 revisions

1.5 gc matched negatives

Generate a bed file of non-peak regions that are gc-matched with the peaks (foreground). These will be included to improve training accuracy on the non-peak regions.

# bpnet-gc-reference - get gc content after binning the entire genome into bins - You might be choose to run just once for a genome for a specific input sequence length and reuse the genomewide_gc_stride_flank_size.gc.bed output for other datasets

bpnet-gc-reference \
        --ref_fasta reference/hg38.genome.fa \
        --chrom_sizes reference/hg38.chrom.sizes \
        --out_prefix reference/genomewide_gc_stride_1000_flank_size_1057.gc.bed \
        --inputlen 2114 \
        --stride 1000
        
    
bpnet-gc-background \
        --peaks_bed ENCSR000EGM/data/peaks_inliers.bed \
        --out_dir ENCSR000EGM/data/ \
        --ref_gc_bed reference/genomewide_gc_stride_1000_flank_size_1057.gc.bed \
        --out_prefix ENCSR000EGM/data/gc_negatives.bed \
        --flank_size 1057 \
        --neg_to_pos_ratio_train 4