Skip to content

This repo contains a nextflow pipeline to do colocalisation analysis of QTLs against GWASs

License

Notifications You must be signed in to change notification settings

eQTL-Catalogue/colocalisation

Repository files navigation

eQTL-Catalogue/colocalisation

This pipeline runs colocalisation analysis using summary statistics from the eQTL Catalogue (or any summary statistics in the same format) and GWAS summary statistics in VCF format downloaded from the MRC OpenGWAS database.

Your GWAS summary statistics are not in the VCF format? See here for quick conversion instructions.

To run in University of Tartu HPC

  1. Clone the repo
git clone https://github.com/eQTL-Catalogue/colocalisation.git
  1. Go to nextflow.config and set the parameters as you need
  • gwas_ss_tsv = "${baseDir}/testdata_coloc/gwas_sumstats_all.tsv"
    // path to the GWASs summary stats files. see example here
  • qtl_ss_tsv = "${baseDir}/testdata_coloc/eqtl_sumstats_tx.tsv"
    // path to the QTL summary statistics see example here
  • gwas_lift_chain = "/gpfs/hpc/projects/eQTLCatalogue/GRCh37_to_GRCh38/GRCh37_to_GRCh38.chain.gz"
    // This is a chain file to lift up the version of GWAS variants from GRCh37 to GRCh38. Don't change it if you don't know what you are doing.
  • hg38_ref_genome = "/gpfs/hpc/projects/genomic_references/annotations/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa"
    // This is a reference genome. Don't change it if you don't know what you are doing.
  • outdir = './results_coloc_tx'
    // Output directory of the pipeline. The results will be here
  • use_permutation = false
    // A flag to inform pipeline if you are using lead_var_pairs (credible sets) to do colocalisation in specific pairs of molecular_trait_id and variant_id. If true the 4th column of qtl_ss_tsv should be permutation run result file of the qtl_subset. If false the same column should be the lead_var_pairs file see example
  • cis_window = 200000
    // cis window where you wanna perform colocalisation. The default is +/-200,000 basepairs.
  • n_batches = 10
    // Number of batches needed to split the lead_var_pairs file in processing. I.E. [] has 42445 pairs in it. so if n_batches=10 in each job pipeline will process 4245 pairs.
  1. start a screen session
screen -S coloc_nf
  1. ssh to stage1
ssh stage1
  1. load the needed modules
module load java-1.8.0_40
module load singularity/3.5.3
module load nextflow
  1. Change directory to where you cloned the repo
cd colocalisation/
  1. run the pipeline
nextflow run main.nf -profile tartu_hpc

Credits

eQTL-Catalogue/colocalisation was originally written by Nurlan Kerimov under supervision of Kaur Alasoo