-
Notifications
You must be signed in to change notification settings - Fork 0
Background_generation
Vivekanandan Ramalingam edited this page May 11, 2024
·
3 revisions
Generate a bed file of non-peak regions that are gc-matched with the peaks (foreground). These will be included to improve training accuracy on the non-peak regions.
# bpnet-gc-reference - get gc content after binning the entire genome into bins - You might be choose to run just once for a genome for a specific input sequence length and reuse the genomewide_gc_stride_flank_size.gc.bed output for other datasets
bpnet-gc-reference \
--ref_fasta reference/hg38.genome.fa \
--chrom_sizes reference/hg38.chrom.sizes \
--out_prefix reference/genomewide_gc_stride_1000_flank_size_1057.gc.bed \
--inputlen 2114 \
--stride 1000
bpnet-gc-background \
--peaks_bed ENCSR000EGM/data/peaks_inliers.bed \
--out_dir ENCSR000EGM/data/ \
--ref_gc_bed reference/genomewide_gc_stride_1000_flank_size_1057.gc.bed \
--out_prefix ENCSR000EGM/data/gc_negatives.bed \
--flank_size 1057 \
--neg_to_pos_ratio_train 4