Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updating gens subworkflow #515

Merged
merged 9 commits into from
Feb 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,10 +64,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Updated modules from nf-core [#412](https://github.com/nf-core/raredisease/pull/412)
- If present, remove duplicate entries in probands and upd_children in the meta. [#420](https://github.com/nf-core/raredisease/pull/420)
- Fixes vep starting as many instances as the square of the number of scatters. [#405](https://github.com/nf-core/raredisease/pull/405)
- Replaced the logic where we added an arbitrary substring to keep file names unique after alignment which we then removed using a split operator, with a simple copy operation. [#425](https://github.com/nf-core/raredisease/pull/425/files)
- Replaced the logic where we added an arbitrary substring to keep file names unique after alignment which we then removed using a split operator, with a simple copy operation. [#425](https://github.com/nf-core/raredisease/pull/425)
- Preventing a crash of rhocall annotate in the case of running four individuals whereof two are affected.
- Fixed memory qualifier in gatk4 germlinecnvcaller and postprocessgermlinecnvcalls
- Fixed wrong process names when outputting versions in `ALIGN_SENTIEON` and `CALL_SNV`.
- Fixed gens subworkflow [#515](https://github.com/nf-core/raredisease/pull/515)

### `Updated`

Expand Down
2 changes: 2 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,8 @@

> Magnusson M, Hughes T, Glabilloy, Bitdeli Chef. genmod: Version 3.7.3. Published online November 15, 2018. doi:10.5281/ZENODO.3841142

- [Gens](https://github.com/Clinical-Genomics-Lund/gens)

- [GLnexus](https://academic.oup.com/bioinformatics/article/36/24/5582/6064144)

> Yun T, Li H, Chang PC, Lin MF, Carroll A, McLean CY. Accurate, scalable cohort variant calls using DeepVariant and GLnexus. Robinson P, ed. Bioinformatics. 2021;36(24):5582-5589. doi:10.1093/bioinformatics/btaa1081
Expand Down
20 changes: 15 additions & 5 deletions conf/modules/gens.config
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,27 @@
//

process {
if (params.gens_switch) {
if (!params.skip_gens && params.analysis_type != "wes") {
withName: '.*GENS:.*' {
publishDir = [
path: { "${params.outdir}/gens" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
enabled: false
]
}

withName: '.*GENS:COLLECTREADCOUNTS' {
ext.args = '--interval-merging-rule OVERLAPPING_ONLY'
ext.args = { [
'--interval-merging-rule OVERLAPPING_ONLY',
'--format HDF5'
].join(' ') }
}

withName: '.*GENS:GENS_GENERATE' {
ext.prefix = { "${meta.id}_gens" }
publishDir = [
path: { "${params.outdir}/gens" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
}
}
17 changes: 17 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Calling mobile elements](#calling-mobile-elements)
- [Annotating mobile elements](#annotating-mobile-elements)
- [Variant evaluation](#variant-evaluation)
- [Gens](#gens)
- [Pipeline information](#pipeline-information)

### Alignment
Expand Down Expand Up @@ -594,6 +595,22 @@ Provided a truth set, SNVs can be evaluated using RTG Tools' vcfeval engine. Out

</details>

### Gens

The sequencing data can be prepared for visualization of CNVs in [Gens](https://github.com/Clinical-Genomics-Lund/gens). This subworkflow is turned off by default. You can activate it by supplying the option `--skip_gens false`. You can read more about how to setup Gens [here](https://github.com/Clinical-Genomics-Lund/gens).

<details markdown="1">
<summary>Output files</summary>

- `gens/`

- `<sample_id>_gens.baf.bed.gz`: contains sample b-allele frequencies in bed format.
- `<sample_id>_gens.baf.bed.gz.tbi`: index of the \*baf.bed.gz file.
- `<sample_id>_gens.cov.bed.gz`: contains sample coverage in bed format.
- `<sample_id>_gens.cov.bed.gz.tbi`: index of the \*cov.bed.gz file.

</details>

### Pipeline information

[Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.
Expand Down
16 changes: 16 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Table of contents:
- [9. Mitochondrial annotation](#9-mitochondrial-annotation)
- [10. Mobile element annoation](#10-mobile-element-annotation)
- [11. Variant evaluation](#11-variant-evaluation)
- [12. Prepare data CNV visualization in Gens](#12-prepare-data-for-cnv-visualisation-in-gens)
- [Run the pipeline](#run-the-pipeline)
- [Direct input in CLI](#direct-input-in-cli)
- [Import from a config file (recommended)](#import-from-a-config-file-recommended)
Expand Down Expand Up @@ -298,6 +299,21 @@ no header and the following columns: `CHROM POS REF_ALLELE ALT_ALLELE AF`. Sampl
<sup>1</sup> This parameter is set to false by default, set it to true if if you'd like to run the evaluation subworkflow
<sup>2</sup> A CSV file that describes the truth VCF files used by RTG Tools' vcfeval for evaluating SNVs. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/rtg_example.csv). The file contains four columns `samplename,vcf,bedregions,evaluationregions` where samplename is the user assigned samplename in the input samplesheet, vcf is the path to the truth vcf file, bedregions and evaluationregions are the path to the bed files that are supposed to be passed through --bed_regions and --evaluation_regions options of vcfeval.

##### 12. Prepare data for CNV visualisation in Gens

Optionally the read data can be prepared for CNV visualization in [Gens](https://github.com/Clinical-Genomics-Lund/gens). This subworkflow is turned off by default. You can activate it by supplying the option `--skip_gens false`.

| Mandatory | Optional |
| ------------------------------ | -------- |
| gens_pon_female<sup>1</sup> | |
| gens_pon_male<sup>1</sup> | |
| gens_interval_list<sup>2</sup> | |
| gens_gnomad_pos<sup>3</sup> | |

<sup>1</sup> Instructions on how to generate the panel of normals can be found [here](https://github.com/Clinical-Genomics-Lund/gens?tab=readme-ov-file#create-pon)<br>
<sup>2</sup> Interval list for CollectReadCounts. Instructions on how to generate the interval list file can be found [here](https://github.com/Clinical-Genomics-Lund/gens?tab=readme-ov-file#create-pon)<br>
<sup>3</sup> File containing SNVs to be used for the B-allele frequency calculations. The developers of gens uses SNVs in gnomad with an allele frecuency above 5%.

#### Run the pipeline

You can directly supply the parameters in the command line (CLI) or use a config file from which the pipeline can import the parameters.
Expand Down
3 changes: 2 additions & 1 deletion main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@ params.call_interval = WorkflowMain.getGenomeAttribute(params,
params.cadd_resources = WorkflowMain.getGenomeAttribute(params, 'cadd_resources')
params.gcnvcaller_model = WorkflowMain.getGenomeAttribute(params, 'gcnvcaller_model')
params.gens_interval_list = WorkflowMain.getGenomeAttribute(params, 'gens_interval_list')
params.gens_pon = WorkflowMain.getGenomeAttribute(params, 'gens_pon')
params.gens_pon_female = WorkflowMain.getGenomeAttribute(params, 'gens_pon_female')
params.gens_pon_male = WorkflowMain.getGenomeAttribute(params, 'gens_pon_male')
params.gens_gnomad_pos = WorkflowMain.getGenomeAttribute(params, 'gens_gnomad_pos')
params.gnomad_af = WorkflowMain.getGenomeAttribute(params, 'gnomad_af')
params.gnomad_af_idx = WorkflowMain.getGenomeAttribute(params, 'gnomad_af_idx')
Expand Down
18 changes: 12 additions & 6 deletions modules/local/gens/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,30 @@ process GENS {
tag "$meta.id"
label 'process_medium'

container 'docker.io/raysloks/gens_preproc:1.0.1'
container 'docker.io/clinicalgenomics/gens_preproc:1.0.11'

input:
tuple val(meta), path(read_counts)
path vcf
tuple val(meta2), path(gvcf)
path gnomad_positions

output:
tuple val(meta), path('*.cov.bed.gz'), emit: cov
tuple val(meta), path('*.baf.bed.gz'), emit: baf
path "versions.yml" , emit: versions
tuple val(meta), path('*.cov.bed.gz') , emit: cov
tuple val(meta), path('*.cov.bed.gz.tbi'), emit: cov_index
tuple val(meta), path('*.baf.bed.gz') , emit: baf
tuple val(meta), path('*.baf.bed.gz.tbi'), emit: baf_index
path "versions.yml" , emit: versions

script:
// Exit if running this module with -profile conda / -profile mamba
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
error "The gens pre-processing module does not support Conda. Please use Docker / Singularity / Podman instead."
}
def prefix = task.ext.prefix ?: "${meta.id}"
"""
generate_gens_data.pl \\
$read_counts \\
$vcf \\
$gvcf \\
$prefix \\
$gnomad_positions

Expand Down
4 changes: 3 additions & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,11 @@ params {
skip_eklipse = false
skip_fastp = false
skip_fastqc = false
skip_gens = true
skip_germlinecnvcaller = false
skip_haplocheck = false
skip_me_annotation = false
skip_mt_annotation = false
skip_qualimap = false
skip_snv_annotation = false
skip_sv_annotation = false
Expand All @@ -38,7 +41,6 @@ params {
skip_mt_subsample = false
skip_vcf2cytosure = true
skip_vep_filter = false
gens_switch = false
cadd_resources = null
platform = 'illumina'

Expand Down
25 changes: 17 additions & 8 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -135,13 +135,22 @@
"help_text": "This file contains the binning intervals used for CollectReadCounts.",
"hidden": true
},
"gens_pon": {
"gens_pon_female": {
"type": "string",
"exists": true,
"format": "file-path",
"fa_icon": "fas fa-file",
"description": "Path to panel of normals for Gens.",
"help_text": "The panel used to run DenoiseReadCounts.",
"description": "Path to female panel of normals for Gens.",
"help_text": "The female panel used to run DenoiseReadCounts.",
"hidden": true
},
"gens_pon_male": {
"type": "string",
"exists": true,
"format": "file-path",
"fa_icon": "fas fa-file",
"description": "Path to male panel of normals for Gens.",
"help_text": "The male panel used to run DenoiseReadCounts.",
"hidden": true
},
"gnomad_af": {
Expand Down Expand Up @@ -445,11 +454,6 @@
"fa_icon": "fas fa-align-center",
"enum": ["wgs", "wes", "mito"]
},
"gens_switch": {
"type": "boolean",
"description": "Specifies whether or not to run gens preprocessing subworkflow.",
"fa_icon": "fas fa-toggle-on"
},
"platform": {
"type": "string",
"default": "illumina",
Expand Down Expand Up @@ -489,6 +493,11 @@
"description": "Specifies whether or not to skip haplocheck.",
"fa_icon": "fas fa-toggle-on"
},
"skip_gens": {
"type": "boolean",
"description": "Specifies whether or not to skip gens preprocessing subworkflow.",
"fa_icon": "fas fa-toggle-on"
},
"skip_germlinecnvcaller": {
"type": "boolean",
"description": "Specifies whether or not to skip CNV calling using GATK's GermlineCNVCaller",
Expand Down
26 changes: 19 additions & 7 deletions subworkflows/local/call_snv.nf
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,15 @@ workflow CALL_SNV {
ch_pcr_indel_model // channel: [optional] [ val(sentieon_dnascope_pcr_indel_model) ]

main:
ch_versions = Channel.empty()
ch_deepvar_vcf = Channel.empty()
ch_deepvar_tbi = Channel.empty()
ch_sentieon_vcf = Channel.empty()
ch_sentieon_tbi = Channel.empty()
ch_versions = Channel.empty()
ch_deepvar_vcf = Channel.empty()
ch_deepvar_tbi = Channel.empty()
ch_deepvar_gvcf = Channel.empty()
ch_deepvar_gtbi = Channel.empty()
ch_sentieon_vcf = Channel.empty()
ch_sentieon_tbi = Channel.empty()
ch_sentieon_gvcf = Channel.empty()
ch_sentieon_gtbi = Channel.empty()

if (params.variant_caller.equals("deepvariant")) {
CALL_SNV_DEEPVARIANT ( // triggered only when params.variant_caller is set as deepvariant
Expand All @@ -50,6 +54,8 @@ workflow CALL_SNV {
)
ch_deepvar_vcf = CALL_SNV_DEEPVARIANT.out.vcf
ch_deepvar_tbi = CALL_SNV_DEEPVARIANT.out.tabix
ch_deepvar_gvcf = CALL_SNV_DEEPVARIANT.out.gvcf
ch_deepvar_gtbi = CALL_SNV_DEEPVARIANT.out.gvcf_tabix
ch_versions = ch_versions.mix(CALL_SNV_DEEPVARIANT.out.versions)
} else if (params.variant_caller.equals("sentieon")) {
CALL_SNV_SENTIEON( // triggered only when params.variant_caller is set as sentieon
Expand All @@ -67,11 +73,15 @@ workflow CALL_SNV {
)
ch_sentieon_vcf = CALL_SNV_SENTIEON.out.vcf
ch_sentieon_tbi = CALL_SNV_SENTIEON.out.tabix
ch_sentieon_gvcf = CALL_SNV_SENTIEON.out.gvcf
ch_sentieon_gtbi = CALL_SNV_SENTIEON.out.gtbi
ch_versions = ch_versions.mix(CALL_SNV_SENTIEON.out.versions)
}

ch_vcf = Channel.empty().mix(ch_deepvar_vcf, ch_sentieon_vcf)
ch_tabix = Channel.empty().mix(ch_deepvar_tbi, ch_sentieon_tbi)
ch_vcf = Channel.empty().mix(ch_deepvar_vcf, ch_sentieon_vcf)
ch_tabix = Channel.empty().mix(ch_deepvar_tbi, ch_sentieon_tbi)
ch_gvcf = Channel.empty().mix(ch_deepvar_gvcf, ch_sentieon_gvcf)
ch_gtabix = Channel.empty().mix(ch_deepvar_gtbi, ch_sentieon_gtbi)

ch_vcf
.join(ch_tabix, failOnMismatch:true, failOnDuplicate:true)
Expand Down Expand Up @@ -120,6 +130,8 @@ workflow CALL_SNV {
genome_vcf = ch_genome_vcf // channel: [ val(meta), path(vcf) ]
genome_tabix = ch_genome_tabix // channel: [ val(meta), path(tbi) ]
genome_vcf_tabix = ch_genome_vcf_tabix // channel: [ val(meta), path(vcf), path(tbi) ]
genome_gvcf = ch_gvcf // channel: [ val(meta), path(gvcf) ]
genome_gtabix = ch_gtabix // channel: [ val(meta), path(gtbi) ]
mt_vcf = POSTPROCESS_MT_CALLS.out.vcf // channel: [ val(meta), path(vcf) ]
mt_tabix = POSTPROCESS_MT_CALLS.out.tbi // channel: [ val(meta), path(tbi) ]
versions = ch_versions // channel: [ path(versions.yml) ]
Expand Down
66 changes: 50 additions & 16 deletions subworkflows/local/gens.nf
Original file line number Diff line number Diff line change
Expand Up @@ -2,33 +2,67 @@
// A preprocessing workflow for Gens
//

include { GATK4_COLLECTREADCOUNTS as COLLECTREADCOUNTS } from '../../modules/nf-core/gatk4/collectreadcounts/main'
include { GATK4_DENOISEREADCOUNTS as DENOISEREADCOUNTS } from '../../modules/nf-core/gatk4/denoisereadcounts/main'
include { GENS as GENS_GENERATE } from '../../modules/local/gens/main'
include { GATK4_COLLECTREADCOUNTS as COLLECTREADCOUNTS } from '../../modules/nf-core/gatk4/collectreadcounts/main'
include { GATK4_DENOISEREADCOUNTS as DENOISEREADCOUNTS_FEMALE } from '../../modules/nf-core/gatk4/denoisereadcounts/main'
include { GATK4_DENOISEREADCOUNTS as DENOISEREADCOUNTS_MALE } from '../../modules/nf-core/gatk4/denoisereadcounts/main'
include { GENS as GENS_GENERATE } from '../../modules/local/gens/main'

workflow GENS {
take:
ch_bam_bai // channel: [mandatory] [ val(meta), path(bam), path(bai) ]
ch_vcf // channel: [mandatory] [ val(meta), path(vcf) ]
ch_genome_fasta // channel: [mandatory] [ val(meta), path(fasta) ]
ch_genome_fai // channel: [mandatory] [ val(meta), path(fai) ]
ch_interval_list // channel: [mandatory] [ path(interval_list) ]
ch_pon // channel: [mandatory] [ path(pon) ]
ch_gnomad_pos // channel: [mandatory] [ path(gnomad_pos) ]
ch_case_info // channel: [mandatory] [ val(case_info) ]
ch_genome_dictionary // channel: [mandatory] [ val(meta), path(dict) ]
ch_bam_bai // channel: [mandatory] [ val(meta), path(bam), path(bai) ]
ch_gvcf // channel: [mandatory] [ val(meta), path(gvcf) ]
ch_genome_fasta // channel: [mandatory] [ val(meta), path(fasta) ]
ch_genome_fai // channel: [mandatory] [ val(meta), path(fai) ]
ch_interval_list // channel: [mandatory] [ path(interval_list) ]
ch_pon_female // channel: [mandatory] [ path(pon) ]
ch_pon_male // channel: [mandatory] [ path(pon) ]
ch_gnomad_pos // channel: [mandatory] [ path(gnomad_pos) ]
ch_case_info // channel: [mandatory] [ val(case_info) ]
ch_genome_dictionary // channel: [mandatory] [ val(meta), path(dict) ]

main:
ch_versions = Channel.empty()

COLLECTREADCOUNTS (ch_bam_bai, ch_genome_fasta, ch_genome_fai, ch_sequence_dictionary, ch_interval_list)
ch_bam_bai
.combine(ch_interval_list)
.set { ch_bam_bai_intervals }

DENOISEREADCOUNTS (COLLECTREADCOUNTS.out.read_counts, ch_pon)
COLLECTREADCOUNTS (
ch_bam_bai_intervals,
ch_genome_fasta,
ch_genome_fai,
ch_genome_dictionary
)

GENS_GENERATE (DENOISEREADCOUNTS.out.standardized_read_counts, ch_vcf.map { meta, vcf -> vcf }, ch_gnomad_pos)
COLLECTREADCOUNTS.out.hdf5
.branch { meta, counts ->
female: meta.sex.equals(2) || meta.sex.equals(0)
male: meta.sex.equals(1)
}
.set { ch_denoisereadcounts_in }

DENOISEREADCOUNTS_FEMALE (
ch_denoisereadcounts_in.female,
ch_pon_female
)

DENOISEREADCOUNTS_MALE (
ch_denoisereadcounts_in.male,
ch_pon_male
)
DENOISEREADCOUNTS_FEMALE.out.standardized
.mix(DENOISEREADCOUNTS_MALE.out.standardized)
.set { ch_denoisereadcounts_out }

GENS_GENERATE (
ch_denoisereadcounts_out,
ch_gvcf,
ch_gnomad_pos
)

ch_versions = ch_versions.mix(COLLECTREADCOUNTS.out.versions.first())
ch_versions = ch_versions.mix(DENOISEREADCOUNTS.out.versions.first())
ch_versions = ch_versions.mix(DENOISEREADCOUNTS_FEMALE.out.versions.first())
ch_versions = ch_versions.mix(DENOISEREADCOUNTS_MALE.out.versions.first())
ch_versions = ch_versions.mix(GENS_GENERATE.out.versions.first())

emit:
Expand Down
Loading
Loading