This section provides the demonstrator technical implementations of pipelines for federated genomic analysis. It follows the general framework for federated analysis adopted by CINECA WP4.
The general approach for each use case is to split the analysis pipeline into two parts:
- Part A of the pipeline can in principle be run in the environments of the appropriate cohorts and reduces private, individual level data to intermediate summary level products.
- The results of the analysis from different cohorts are collected at a central location.
- Part B of the pipeline aggregates the summary level products into the final scientific product, which is made available to the end user.
This can be demonstrated with the following figure:
Part A and Part B of the pipelines have to be workflows written in Nextflow.
Thanks to Nextflow versatility, the pipelines designed in this way are able to support a variety of scenarios and can run in most existing computing environments. Please see the instructions for running them on:
See also additional technical considerations regarding Nextlow, specifically about how to build it from source if necessary.
In the modern world, true portability is not feasible without including some sort of containerisation layer. The pipelines developed by CINECA WP4 use Docker images for supplying their dependencies. Hence, each environment attempting to run these pipelines will need to support either Docker or Singularity (to which Docker images can be transparently converted).
Example implementations of the proposed approach include: