nf-core · jemten · Feb 14, 2024 · Feb 13, 2024 · Feb 13, 2024 · Feb 13, 2024
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -64,10 +64,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Updated modules from nf-core [#412](https://github.com/nf-core/raredisease/pull/412)
 - If present, remove duplicate entries in probands and upd_children in the meta. [#420](https://github.com/nf-core/raredisease/pull/420)
 - Fixes vep starting as many instances as the square of the number of scatters. [#405](https://github.com/nf-core/raredisease/pull/405)
-- Replaced the logic where we added an arbitrary substring to keep file names unique after alignment which we then removed using a split operator, with a simple copy operation. [#425](https://github.com/nf-core/raredisease/pull/425/files)
+- Replaced the logic where we added an arbitrary substring to keep file names unique after alignment which we then removed using a split operator, with a simple copy operation. [#425](https://github.com/nf-core/raredisease/pull/425)
 - Preventing a crash of rhocall annotate in the case of running four individuals whereof two are affected.
 - Fixed memory qualifier in gatk4 germlinecnvcaller and postprocessgermlinecnvcalls
 - Fixed wrong process names when outputting versions in `ALIGN_SENTIEON` and `CALL_SNV`.
+- Fixed gens subworkflow [#515](https://github.com/nf-core/raredisease/pull/515)
 
 ### `Updated`
 

diff --git a/CITATIONS.md b/CITATIONS.md
@@ -60,6 +60,8 @@
 
   > Magnusson M, Hughes T, Glabilloy, Bitdeli Chef. genmod: Version 3.7.3. Published online November 15, 2018. doi:10.5281/ZENODO.3841142
 
+- [Gens](https://github.com/Clinical-Genomics-Lund/gens)
+
 - [GLnexus](https://academic.oup.com/bioinformatics/article/36/24/5582/6064144)
 
   > Yun T, Li H, Chang PC, Lin MF, Carroll A, McLean CY. Accurate, scalable cohort variant calls using DeepVariant and GLnexus. Robinson P, ed. Bioinformatics. 2021;36(24):5582-5589. doi:10.1093/bioinformatics/btaa1081

diff --git a/conf/modules/gens.config b/conf/modules/gens.config
@@ -16,17 +16,27 @@
 //
 
 process {
-    if (params.gens_switch) {
+    if (!params.skip_gens && params.analysis_type != "wes") {
         withName: '.*GENS:.*' {
             publishDir = [
-                path: { "${params.outdir}/gens" },
-                mode: params.publish_dir_mode,
-                saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+                enabled: false
             ]
         }
 
         withName: '.*GENS:COLLECTREADCOUNTS' {
-            ext.args = '--interval-merging-rule OVERLAPPING_ONLY'
+            ext.args = { [
+                '--interval-merging-rule OVERLAPPING_ONLY',
+                '--format HDF5'
+                ].join(' ') }
+        }
+
+        withName: '.*GENS:GENS_GENERATE' {
+            ext.prefix = { "${meta.id}_gens" }
+            publishDir = [
+                path: { "${params.outdir}/gens" },
+                mode: params.publish_dir_mode,
+                saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+            ]
         }
     }
 }
diff --git a/docs/output.md b/docs/output.md
@@ -72,6 +72,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
       - [Calling mobile elements](#calling-mobile-elements)
       - [Annotating mobile elements](#annotating-mobile-elements)
     - [Variant evaluation](#variant-evaluation)
+    - [Gens](#gens)
     - [Pipeline information](#pipeline-information)
 
 ### Alignment
@@ -594,6 +595,22 @@ Provided a truth set, SNVs can be evaluated using RTG Tools' vcfeval engine. Out
 
 </details>
 
+### Gens
+
+The sequencing data can be prepared for visualization of CNVs in [Gens](https://github.com/Clinical-Genomics-Lund/gens). This subworkflow is turned off by default. You can activate it by supplying the option `--skip_gens false`. You can read more about how to setup Gens [here](https://github.com/Clinical-Genomics-Lund/gens).
+
+<details markdown="1">
+<summary>Output files</summary>
+
+- `gens/`
+
+  - `<sample_id>_gens.baf.bed.gz`: contains sample b-allele frequencies in bed format.
+  - `<sample_id>_gens.baf.bed.gz.tbi`: index of the \*baf.bed.gz file.
+  - `<sample_id>_gens.cov.bed.gz`: contains sample coverage in bed format.
+  - `<sample_id>_gens.cov.bed.gz.tbi`: index of the \*cov.bed.gz file.
+
+</details>
+
 ### Pipeline information
 
 [Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.

diff --git a/docs/usage.md b/docs/usage.md
@@ -23,6 +23,7 @@ Table of contents:
       - [9. Mitochondrial annotation](#9-mitochondrial-annotation)
       - [10. Mobile element annoation](#10-mobile-element-annotation)
       - [11. Variant evaluation](#11-variant-evaluation)
+      - [12. Prepare data CNV visualization in Gens](#12-prepare-data-for-cnv-visualisation-in-gens)
     - [Run the pipeline](#run-the-pipeline)
       - [Direct input in CLI](#direct-input-in-cli)
       - [Import from a config file (recommended)](#import-from-a-config-file-recommended)
@@ -298,6 +299,21 @@ no header and the following columns: `CHROM POS REF_ALLELE ALT_ALLELE AF`. Sampl
 <sup>1</sup> This parameter is set to false by default, set it to true if if you'd like to run the evaluation subworkflow
 <sup>2</sup> A CSV file that describes the truth VCF files used by RTG Tools' vcfeval for evaluating SNVs. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/rtg_example.csv). The file contains four columns `samplename,vcf,bedregions,evaluationregions` where samplename is the user assigned samplename in the input samplesheet, vcf is the path to the truth vcf file, bedregions and evaluationregions are the path to the bed files that are supposed to be passed through --bed_regions and --evaluation_regions options of vcfeval.
 
+##### 12. Prepare data for CNV visualisation in Gens
+
+Optionally the read data can be prepared for CNV visualization in [Gens](https://github.com/Clinical-Genomics-Lund/gens). This subworkflow is turned off by default. You can activate it by supplying the option `--skip_gens false`.
+
+| Mandatory                      | Optional |
+| ------------------------------ | -------- |
+| gens_pon_female<sup>1</sup>    |          |
+| gens_pon_male<sup>1</sup>      |          |
+| gens_interval_list<sup>2</sup> |          |
+| gens_gnomad_pos<sup>3</sup>    |          |
+
+<sup>1</sup> Instructions on how to generate the panel of normals can be found [here](https://github.com/Clinical-Genomics-Lund/gens?tab=readme-ov-file#create-pon)<br>
+<sup>2</sup> Interval list for CollectReadCounts. Instructions on how to generate the interval list file can be found [here](https://github.com/Clinical-Genomics-Lund/gens?tab=readme-ov-file#create-pon)<br>
+<sup>3</sup> File containing SNVs to be used for the B-allele frequency calculations. The developers of gens uses SNVs in gnomad with an allele frecuency above 5%.
+
 #### Run the pipeline
 
 You can directly supply the parameters in the command line (CLI) or use a config file from which the pipeline can import the parameters.

diff --git a/main.nf b/main.nf
@@ -24,7 +24,8 @@ params.call_interval                  = WorkflowMain.getGenomeAttribute(params,
 params.cadd_resources                 = WorkflowMain.getGenomeAttribute(params, 'cadd_resources')
 params.gcnvcaller_model               = WorkflowMain.getGenomeAttribute(params, 'gcnvcaller_model')
 params.gens_interval_list             = WorkflowMain.getGenomeAttribute(params, 'gens_interval_list')
-params.gens_pon                       = WorkflowMain.getGenomeAttribute(params, 'gens_pon')
+params.gens_pon_female                = WorkflowMain.getGenomeAttribute(params, 'gens_pon_female')
+params.gens_pon_male                  = WorkflowMain.getGenomeAttribute(params, 'gens_pon_male')
 params.gens_gnomad_pos                = WorkflowMain.getGenomeAttribute(params, 'gens_gnomad_pos')
 params.gnomad_af                      = WorkflowMain.getGenomeAttribute(params, 'gnomad_af')
 params.gnomad_af_idx                  = WorkflowMain.getGenomeAttribute(params, 'gnomad_af_idx')

diff --git a/modules/local/gens/main.nf b/modules/local/gens/main.nf
@@ -2,24 +2,30 @@ process GENS {
     tag "$meta.id"
     label 'process_medium'
 
-    container 'docker.io/raysloks/gens_preproc:1.0.1'
+    container 'docker.io/clinicalgenomics/gens_preproc:1.0.11'
 
     input:
     tuple val(meta), path(read_counts)
-    path  vcf
+    tuple val(meta2), path(gvcf)
     path  gnomad_positions
 
     output:
-    tuple val(meta), path('*.cov.bed.gz'), emit: cov
-    tuple val(meta), path('*.baf.bed.gz'), emit: baf
-    path  "versions.yml"                 , emit: versions
+    tuple val(meta), path('*.cov.bed.gz')    , emit: cov
+    tuple val(meta), path('*.cov.bed.gz.tbi'), emit: cov_index
+    tuple val(meta), path('*.baf.bed.gz')    , emit: baf
+    tuple val(meta), path('*.baf.bed.gz.tbi'), emit: baf_index
+    path  "versions.yml"                     , emit: versions
 
     script:
+    // Exit if running this module with -profile conda / -profile mamba
+    if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
+        error "The gens pre-processing module does not support Conda. Please use Docker / Singularity / Podman instead."
+    }
     def prefix = task.ext.prefix ?: "${meta.id}"
     """
     generate_gens_data.pl \\
         $read_counts \\
-        $vcf \\
+        $gvcf \\
         $prefix \\
         $gnomad_positions
 

diff --git a/nextflow.config b/nextflow.config
@@ -28,8 +28,11 @@ params {
     skip_eklipse               = false
     skip_fastp                 = false
     skip_fastqc                = false
+    skip_gens                  = true
     skip_germlinecnvcaller     = false
     skip_haplocheck            = false
+    skip_me_annotation         = false
+    skip_mt_annotation         = false
     skip_qualimap              = false
     skip_snv_annotation        = false
     skip_sv_annotation         = false
@@ -38,7 +41,6 @@ params {
     skip_mt_subsample          = false
     skip_vcf2cytosure          = true
     skip_vep_filter            = false
-    gens_switch                = false
     cadd_resources             = null
     platform                   = 'illumina'
 

diff --git a/nextflow_schema.json b/nextflow_schema.json
@@ -135,13 +135,22 @@
                     "help_text": "This file contains the binning intervals used for CollectReadCounts.",
                     "hidden": true
                 },
-                "gens_pon": {
+                "gens_pon_female": {
                     "type": "string",
                     "exists": true,
                     "format": "file-path",
                     "fa_icon": "fas fa-file",
-                    "description": "Path to panel of normals for Gens.",
-                    "help_text": "The panel used to run DenoiseReadCounts.",
+                    "description": "Path to female panel of normals for Gens.",
+                    "help_text": "The female panel used to run DenoiseReadCounts.",
+                    "hidden": true
+                },
+                "gens_pon_male": {
+                    "type": "string",
+                    "exists": true,
+                    "format": "file-path",
+                    "fa_icon": "fas fa-file",
+                    "description": "Path to male panel of normals for Gens.",
+                    "help_text": "The male panel used to run DenoiseReadCounts.",
                     "hidden": true
                 },
                 "gnomad_af": {
@@ -445,11 +454,6 @@
                     "fa_icon": "fas fa-align-center",
                     "enum": ["wgs", "wes", "mito"]
                 },
-                "gens_switch": {
-                    "type": "boolean",
-                    "description": "Specifies whether or not to run gens preprocessing subworkflow.",
-                    "fa_icon": "fas fa-toggle-on"
-                },
                 "platform": {
                     "type": "string",
                     "default": "illumina",
@@ -489,6 +493,11 @@
                     "description": "Specifies whether or not to skip haplocheck.",
                     "fa_icon": "fas fa-toggle-on"
                 },
+                "skip_gens": {
+                    "type": "boolean",
+                    "description": "Specifies whether or not to skip gens preprocessing subworkflow.",
+                    "fa_icon": "fas fa-toggle-on"
+                },
                 "skip_germlinecnvcaller": {
                     "type": "boolean",
                     "description": "Specifies whether or not to skip CNV calling using GATK's GermlineCNVCaller",

diff --git a/subworkflows/local/call_snv.nf b/subworkflows/local/call_snv.nf
@@ -33,11 +33,15 @@ workflow CALL_SNV {
         ch_pcr_indel_model    // channel: [optional] [ val(sentieon_dnascope_pcr_indel_model) ]
 
     main:
-        ch_versions     = Channel.empty()
-        ch_deepvar_vcf  = Channel.empty()
-        ch_deepvar_tbi  = Channel.empty()
-        ch_sentieon_vcf = Channel.empty()
-        ch_sentieon_tbi = Channel.empty()
+        ch_versions      = Channel.empty()
+        ch_deepvar_vcf   = Channel.empty()
+        ch_deepvar_tbi   = Channel.empty()
+        ch_deepvar_gvcf  = Channel.empty()
+        ch_deepvar_gtbi  = Channel.empty()
+        ch_sentieon_vcf  = Channel.empty()
+        ch_sentieon_tbi  = Channel.empty()
+        ch_sentieon_gvcf = Channel.empty()
+        ch_sentieon_gtbi = Channel.empty()
 
         if (params.variant_caller.equals("deepvariant")) {
             CALL_SNV_DEEPVARIANT (      // triggered only when params.variant_caller is set as deepvariant
@@ -50,6 +54,8 @@ workflow CALL_SNV {
             )
             ch_deepvar_vcf = CALL_SNV_DEEPVARIANT.out.vcf
             ch_deepvar_tbi = CALL_SNV_DEEPVARIANT.out.tabix
+            ch_deepvar_gvcf = CALL_SNV_DEEPVARIANT.out.gvcf
+            ch_deepvar_gtbi = CALL_SNV_DEEPVARIANT.out.gvcf_tabix
             ch_versions    = ch_versions.mix(CALL_SNV_DEEPVARIANT.out.versions)
         } else if (params.variant_caller.equals("sentieon")) {
             CALL_SNV_SENTIEON(         // triggered only when params.variant_caller is set as sentieon
@@ -67,11 +73,15 @@ workflow CALL_SNV {
             )
             ch_sentieon_vcf = CALL_SNV_SENTIEON.out.vcf
             ch_sentieon_tbi = CALL_SNV_SENTIEON.out.tabix
+            ch_sentieon_gvcf = CALL_SNV_SENTIEON.out.gvcf
+            ch_sentieon_gtbi = CALL_SNV_SENTIEON.out.gtbi
             ch_versions    = ch_versions.mix(CALL_SNV_SENTIEON.out.versions)
         }
 
-        ch_vcf       = Channel.empty().mix(ch_deepvar_vcf, ch_sentieon_vcf)
-        ch_tabix     = Channel.empty().mix(ch_deepvar_tbi, ch_sentieon_tbi)
+        ch_vcf    = Channel.empty().mix(ch_deepvar_vcf, ch_sentieon_vcf)
+        ch_tabix  = Channel.empty().mix(ch_deepvar_tbi, ch_sentieon_tbi)
+        ch_gvcf   = Channel.empty().mix(ch_deepvar_gvcf, ch_sentieon_gvcf)
+        ch_gtabix = Channel.empty().mix(ch_deepvar_gtbi, ch_sentieon_gtbi)
 
         ch_vcf
             .join(ch_tabix, failOnMismatch:true, failOnDuplicate:true)
@@ -120,6 +130,8 @@ workflow CALL_SNV {
         genome_vcf       = ch_genome_vcf                // channel: [ val(meta), path(vcf) ]
         genome_tabix     = ch_genome_tabix              // channel: [ val(meta), path(tbi) ]
         genome_vcf_tabix = ch_genome_vcf_tabix          // channel: [ val(meta), path(vcf), path(tbi) ]
+        genome_gvcf      = ch_gvcf                      // channel: [ val(meta), path(gvcf) ]
+        genome_gtabix    = ch_gtabix                    // channel: [ val(meta), path(gtbi) ]
         mt_vcf           = POSTPROCESS_MT_CALLS.out.vcf // channel: [ val(meta), path(vcf) ]
         mt_tabix         = POSTPROCESS_MT_CALLS.out.tbi // channel: [ val(meta), path(tbi) ]
         versions         = ch_versions                  // channel: [ path(versions.yml) ]

diff --git a/subworkflows/local/gens.nf b/subworkflows/local/gens.nf
@@ -2,33 +2,67 @@
 // A preprocessing workflow for Gens
 //
 
-include { GATK4_COLLECTREADCOUNTS as COLLECTREADCOUNTS } from '../../modules/nf-core/gatk4/collectreadcounts/main'
-include { GATK4_DENOISEREADCOUNTS as DENOISEREADCOUNTS } from '../../modules/nf-core/gatk4/denoisereadcounts/main'
-include { GENS as GENS_GENERATE                        } from '../../modules/local/gens/main'
+include { GATK4_COLLECTREADCOUNTS as COLLECTREADCOUNTS        } from '../../modules/nf-core/gatk4/collectreadcounts/main'
+include { GATK4_DENOISEREADCOUNTS as DENOISEREADCOUNTS_FEMALE } from '../../modules/nf-core/gatk4/denoisereadcounts/main'
+include { GATK4_DENOISEREADCOUNTS as DENOISEREADCOUNTS_MALE   } from '../../modules/nf-core/gatk4/denoisereadcounts/main'
+include { GENS as GENS_GENERATE                               } from '../../modules/local/gens/main'
 
 workflow GENS {
     take:
-        ch_bam_bai            // channel: [mandatory] [ val(meta), path(bam), path(bai) ]
-        ch_vcf                // channel: [mandatory] [ val(meta), path(vcf) ]
-        ch_genome_fasta       // channel: [mandatory] [ val(meta), path(fasta) ]
-        ch_genome_fai         // channel: [mandatory] [ val(meta), path(fai) ]
-        ch_interval_list      // channel: [mandatory] [ path(interval_list) ]
-        ch_pon                // channel: [mandatory] [ path(pon) ]
-        ch_gnomad_pos         // channel: [mandatory] [ path(gnomad_pos) ]
-        ch_case_info          // channel: [mandatory] [ val(case_info) ]
-        ch_genome_dictionary  // channel: [mandatory] [ val(meta), path(dict) ]
+        ch_bam_bai           // channel: [mandatory] [ val(meta), path(bam), path(bai) ]
+        ch_gvcf              // channel: [mandatory] [ val(meta), path(gvcf) ]
+        ch_genome_fasta      // channel: [mandatory] [ val(meta), path(fasta) ]
+        ch_genome_fai        // channel: [mandatory] [ val(meta), path(fai) ]
+        ch_interval_list     // channel: [mandatory] [ path(interval_list) ]
+        ch_pon_female        // channel: [mandatory] [ path(pon) ]
+        ch_pon_male          // channel: [mandatory] [ path(pon) ]
+        ch_gnomad_pos        // channel: [mandatory] [ path(gnomad_pos) ]
+        ch_case_info         // channel: [mandatory] [ val(case_info) ]
+        ch_genome_dictionary // channel: [mandatory] [ val(meta), path(dict) ]
 
     main:
         ch_versions = Channel.empty()
 
-        COLLECTREADCOUNTS (ch_bam_bai, ch_genome_fasta, ch_genome_fai, ch_sequence_dictionary, ch_interval_list)
+        ch_bam_bai
+            .combine(ch_interval_list)
+            .set { ch_bam_bai_intervals }
 
-        DENOISEREADCOUNTS (COLLECTREADCOUNTS.out.read_counts, ch_pon)
+        COLLECTREADCOUNTS (
+            ch_bam_bai_intervals,
+            ch_genome_fasta,
+            ch_genome_fai,
+            ch_genome_dictionary
+        )
 
-        GENS_GENERATE (DENOISEREADCOUNTS.out.standardized_read_counts, ch_vcf.map { meta, vcf -> vcf }, ch_gnomad_pos)
+        COLLECTREADCOUNTS.out.hdf5
+            .branch { meta, counts ->
+                female: meta.sex.equals(2) || meta.sex.equals(0)
+                male: meta.sex.equals(1)
+            }
+            .set { ch_denoisereadcounts_in }
+
+        DENOISEREADCOUNTS_FEMALE (
+            ch_denoisereadcounts_in.female,
+            ch_pon_female
+        )
+
+        DENOISEREADCOUNTS_MALE (
+            ch_denoisereadcounts_in.male,
+            ch_pon_male
+        )
+        DENOISEREADCOUNTS_FEMALE.out.standardized
+            .mix(DENOISEREADCOUNTS_MALE.out.standardized)
+            .set { ch_denoisereadcounts_out }
+
+        GENS_GENERATE (
+            ch_denoisereadcounts_out,
+            ch_gvcf,
+            ch_gnomad_pos
+        )
 
         ch_versions = ch_versions.mix(COLLECTREADCOUNTS.out.versions.first())
-        ch_versions = ch_versions.mix(DENOISEREADCOUNTS.out.versions.first())
+        ch_versions = ch_versions.mix(DENOISEREADCOUNTS_FEMALE.out.versions.first())
+        ch_versions = ch_versions.mix(DENOISEREADCOUNTS_MALE.out.versions.first())
         ch_versions = ch_versions.mix(GENS_GENERATE.out.versions.first())
 
     emit: