Build indices from the GDC reference files

https://gdc.cancer.gov/about-data/gdc-data-processing/gdc-reference-files

This repo is on biowulf at /data/CCBR_Pipeliner/db/PipeDB/GDC_refs

The snakemake workflow downloads references from Encode, Entrez, and GDC, adds viruses and decoys to the hg19 fasta, and executes renee build for the hg38 and hg19 genome versions specified in the config file.

The hg38 fasta files were downloaded from the GDC with virus and decoy sequences already added, while we added these sequences to the hg19 fasta from Encode using this snakemake workflow.

module load snakemake/7
snakemake -j 8
chmod -R a+r /data/CCBR_Pipeliner/db/PipeDB/GDC_refs

After the renee build jobs complete, copy the genome JSON files to the RENEE repo:

cp hg*/*.json /data/CCBR_Pipeliner/Pipelines/RENEE/renee-dev-sovacool/config/genomes/biowulf/

Make modified versions for FRCE:

cp hg*/*.json /data/CCBR_Pipeliner/Pipelines/RENEE/renee-dev-sovacool/config/genomes/frce/
sed -i "s|/data/CCBR_Pipeliner/db/PipeDB/GDC_refs/|/mnt/projects/CCBR-Pipelines/db/GDC_refs/|g" \
    config/genomes/frce/*

Copy the reference files to FRCE:

ssh 10.156.101.10
rsync -rLK --progress --ignore-existing --exclude=".*" \
    helix.nih.gov:/data/CCBR_Pipeliner/db/PipeDB/GDC_refs /mnt/projects/CCBR-Pipelines/db/
chmod -R a+r /mnt/projects/CCBR-Pipelines/db/GDC_refs/hg*
exit

Finally, contribute the changes to RENEE via a pull request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Build indices from the GDC reference files

Files

README.md

Latest commit

History

README.md

File metadata and controls

Build indices from the GDC reference files