Pipelines for federated genomic analysis

This section provides the demonstrator technical implementations of pipelines for federated genomic analysis. It follows the general framework for federated analysis adopted by CINECA WP4.

Background

The general approach for each use case is to split the analysis pipeline into two parts:

Part A of the pipeline can in principle be run in the environments of the appropriate cohorts and reduces private, individual level data to intermediate summary level products.
The results of the analysis from different cohorts are collected at a central location.
Part B of the pipeline aggregates the summary level products into the final scientific product, which is made available to the end user.

This can be demonstrated with the following figure:

Part A and Part B of the pipelines have to be workflows written in Nextflow.

Environments for running the pipelines

Thanks to Nextflow versatility, the pipelines designed in this way are able to support a variety of scenarios and can run in most existing computing environments. Please see the instructions for running them on:

TESK
SLURM
LSF

See also additional technical considerations regarding Nextlow, specifically about how to build it from source if necessary.

Note on portability

In the modern world, true portability is not feasible without including some sort of containerisation layer. The pipelines developed by CINECA WP4 use Docker images for supplying their dependencies. Hence, each environment attempting to run these pipelines will need to support either Docker or Singularity (to which Docker images can be transparently converted).

Demonstrators

Example implementations of the proposed approach include:

4.3.1. Joint cohort genotyping
4.3.3. eQTL analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Pipelines for federated genomic analysis

Background

Environments for running the pipelines

Note on portability

Demonstrators

Files

README.md

Latest commit

History

README.md

File metadata and controls

Pipelines for federated genomic analysis

Background

Environments for running the pipelines

Note on portability

Demonstrators