Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parameter to supply variant consequence files #510

Merged
merged 4 commits into from
Feb 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- New workflow for annotating mobile elements [#483](https://github.com/nf-core/raredisease/pull/483)
- Added a functionality to subsample mitochondrial alignment, and a new parameter `skip_mt_subsample` to skip the subworkflow [#508](https://github.com/nf-core/raredisease/pull/508).
- Chromograph to plot coverage across chromosomes [#507](https://github.com/nf-core/raredisease/pull/507)
- Added two new parameters `variant_consequences_snv` and `variant_consequences_sv` to supply variant consequence files for annotating SNVs and SVs. [#509](https://github.com/nf-core/raredisease/pull/509)

### `Changed`

Expand Down
41 changes: 0 additions & 41 deletions assets/variant_consequences_v2.txt

This file was deleted.

4 changes: 3 additions & 1 deletion conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ params {
intervals_y = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/targetY.interval_list"
known_dbsnp = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/dbsnp_-138-.vcf.gz"
ml_model = "https://s3.amazonaws.com/sentieon-release/other/SentieonDNAscopeModel1.0.model"
mobile_element_references = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/mobile_element_references.tsv"
mobile_element_references = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/mobile_element_references.tsv"
mobile_element_svdb_annotations = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/svdb_querydb_files.csv"
reduced_penetrance = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/reduced_penetrance.tsv"
score_config_mt = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/rank_model_snv.ini"
Expand All @@ -55,6 +55,8 @@ params {
vcfanno_lua = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/vcfanno_functions.lua"
vcfanno_resources = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/vcfanno_resources.txt"
vcfanno_toml = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/vcfanno_config.toml"
variant_consequences_snv = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/variant_consequences_v2.txt"
variant_consequences_sv = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/variant_consequences_v2.txt"
vep_cache = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/vep_cache_and_plugins.tar.gz"
vep_filters = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/hgnc.txt"
vep_cache_version = 107
Expand Down
4 changes: 3 additions & 1 deletion conf/test_one_sample.config
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ params {
intervals_y = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/targetY.interval_list"
known_dbsnp = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/dbsnp_-138-.vcf.gz"
ml_model = "https://s3.amazonaws.com/sentieon-release/other/SentieonDNAscopeModel1.0.model"
mobile_element_references = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/mobile_element_references.tsv"
mobile_element_references = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/mobile_element_references.tsv"
mobile_element_svdb_annotations = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/svdb_querydb_files.csv"
reduced_penetrance = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/reduced_penetrance.tsv"
score_config_mt = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/rank_model_snv.ini"
Expand All @@ -55,6 +55,8 @@ params {
vcfanno_lua = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/vcfanno_functions.lua"
vcfanno_resources = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/vcfanno_resources.txt"
vcfanno_toml = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/vcfanno_config.toml"
variant_consequences_snv = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/variant_consequences_v2.txt"
variant_consequences_sv = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/variant_consequences_v2.txt"
vep_cache = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/vep_cache_and_plugins.tar.gz"
vep_filters = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/hgnc.txt"
vep_cache_version = 107
Expand Down
50 changes: 28 additions & 22 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,15 +221,16 @@ The mandatory and optional parameters for each category are tabulated below.

##### 7. SNV annotation & Ranking

| Mandatory | Optional |
| ----------------------------- | ------------------------------ |
| genome<sup>1</sup> | reduced_penetrance<sup>7</sup> |
| vcfanno_resources<sup>2</sup> | vcfanno_lua |
| vcfanno_toml<sup>3</sup> | vep_filters<sup>8</sup> |
| vep_cache_version | cadd_resources<sup>9</sup> |
| vep_cache<sup>4</sup> | vep_plugin_files<sup>10</sup> |
| gnomad_af<sup>5</sup> | |
| score_config_snv<sup>6</sup> | |
| Mandatory | Optional |
| ------------------------------------ | ------------------------------ |
| genome<sup>1</sup> | reduced_penetrance<sup>8</sup> |
| vcfanno_resources<sup>2</sup> | vcfanno_lua |
| vcfanno_toml<sup>3</sup> | vep_filters<sup>9</sup> |
| vep_cache_version | cadd_resources<sup>10</sup> |
| vep_cache<sup>4</sup> | vep_plugin_files<sup>11</sup> |
| gnomad_af<sup>5</sup> | |
| score_config_snv<sup>6</sup> | |
| variant_consequences_snv<sup>7</sup> | |

<sup>1</sup>Genome version is used by VEP. You have the option to choose between GRCh37 and GRCh38.<br />
<sup>2</sup>Path to VCF files and their indices used by vcfanno. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/vcfanno_resources.txt).<br />
Expand All @@ -240,10 +241,11 @@ See example cache [here](https://raw.githubusercontent.com/nf-core/test-datasets
<sup>5</sup> GnomAD VCF files can be downloaded from [here](https://gnomad.broadinstitute.org/downloads). The option `gnomad_af` expects a tab-delimited file with
no header and the following columns: `CHROM POS REF_ALLELE ALT_ALLELE AF`. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/gnomad_reformated.tab.gz).<br />
<sup>6</sup>Used by GENMOD for ranking the variants. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/rank_model_snv.ini).<br />
<sup>7</sup>Used by GENMOD while modeling the variants. Contains a list of loci that show [reduced penetrance](https://medlineplus.gov/genetics/understanding/inheritance/penetranceexpressivity/) in people. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/reduced_penetrance.tsv).<br />
<sup>8</sup> This file contains a list of candidate genes (with [HGNC](https://www.genenames.org/) IDs) that is used to split the variants into canditate variants and research variants. Research variants contain all the variants, while candidate variants are a subset of research variants and are associated with candidate genes. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/hgnc.txt). Not required if --skip_vep_filter is set to true.<br />
<sup>9</sup>Path to a folder containing cadd annotations. Equivalent of the data/annotations/ folder described [here](https://github.com/kircherlab/CADD-scripts/#manual-installation), and it is used to calculate CADD scores for small indels. <br />
<sup>10</sup>A CSV file that describes the files used by VEP's named and custom plugins. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/vep_files.csv). <br />
<sup>7</sup>File containing list of SO terms listed in the order of severity from most severe to lease severe for annotating genomic and mitochondrial SNVs. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/variant_consequences_v2.txt). You can learn more about these terms [here](https://grch37.ensembl.org/info/genome/variation/prediction/predicted_data.html).
<sup>8</sup>Used by GENMOD while modeling the variants. Contains a list of loci that show [reduced penetrance](https://medlineplus.gov/genetics/understanding/inheritance/penetranceexpressivity/) in people. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/reduced_penetrance.tsv).<br />
<sup>9</sup> This file contains a list of candidate genes (with [HGNC](https://www.genenames.org/) IDs) that is used to split the variants into canditate variants and research variants. Research variants contain all the variants, while candidate variants are a subset of research variants and are associated with candidate genes. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/hgnc.txt). Not required if --skip_vep_filter is set to true.<br />
<sup>10</sup>Path to a folder containing cadd annotations. Equivalent of the data/annotations/ folder described [here](https://github.com/kircherlab/CADD-scripts/#manual-installation), and it is used to calculate CADD scores for small indels. <br />
<sup>11</sup>A CSV file that describes the files used by VEP's named and custom plugins. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/vep_files.csv). <br />

> NB: We use CADD only to annotate small indels. To annotate SNVs with precomputed CADD scores, pass the file containing CADD scores as a resource to vcfanno instead. Files containing the precomputed CADD scores for SNVs can be downloaded from [here](https://cadd.gs.washington.edu/download) (description: "All possible SNVs of GRCh3<7/8>/hg3<7/8>")

Expand All @@ -256,20 +258,23 @@ no header and the following columns: `CHROM POS REF_ALLELE ALT_ALLELE AF`. Sampl
| vep_cache_version | vep_filters |
| vep_cache | vep_plugin_files |
| score_config_sv | |
| variant_consequences_sv<sup>2</sup> | |

<sup>1</sup> A CSV file that describes the databases (VCFs or BEDPEs) used by SVDB for annotating structural variants. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/svdb_querydb_files.csv). Information about the column headers can be found [here](https://github.com/J35P312/SVDB#Query).
<sup>2</sup> File containing list of SO terms listed in the order of severity from most severe to lease severe for annotating genomic SVs. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/variant_consequences_v2.txt). You can learn more about these terms [here](https://grch37.ensembl.org/info/genome/variation/prediction/predicted_data.html).

##### 9. Mitochondrial annotation

| Mandatory | Optional |
| ----------------- | ---------------- |
| genome | vep_filters |
| mito_name | vep_plugin_files |
| vcfanno_resources | |
| vcfanno_toml | |
| vep_cache_version | |
| vep_cache | |
| score_config_mt | |
| Mandatory | Optional |
| ------------------------ | ---------------- |
| genome | vep_filters |
| mito_name | vep_plugin_files |
| vcfanno_resources | |
| vcfanno_toml | |
| vep_cache_version | |
| vep_cache | |
| score_config_mt | |
| variant_consequences_snv | |

##### 10. Mobile element annotation

Expand All @@ -279,6 +284,7 @@ no header and the following columns: `CHROM POS REF_ALLELE ALT_ALLELE AF`. Sampl
| mobile_element_svdb_annotations<sup>1</sup> | |
| vep_cache_version | |
| vep_cache | |
| variant_consequences_sv | |

<sup>1</sup> A CSV file that describes the databases (VCFs) used by SVDB for annotating mobile elements with allele frequencies. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/svdb_querydb_files.csv).

Expand Down
2 changes: 2 additions & 0 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ params.sdf = WorkflowMain.getGenomeAttribute(params,
params.svdb_query_dbs = WorkflowMain.getGenomeAttribute(params, 'svdb_query_dbs')
params.target_bed = WorkflowMain.getGenomeAttribute(params, 'target_bed')
params.variant_catalog = WorkflowMain.getGenomeAttribute(params, 'variant_catalog')
params.variant_consequences_snv = WorkflowMain.getGenomeAttribute(params, 'variant_consequences_snv')
params.variant_consequences_sv = WorkflowMain.getGenomeAttribute(params, 'variant_consequences_sv')
params.vep_filters = WorkflowMain.getGenomeAttribute(params, 'vep_filters')
params.vcf2cytosure_blacklist = WorkflowMain.getGenomeAttribute(params, 'vcf2cytosure_blacklist')
params.vcfanno_resources = WorkflowMain.getGenomeAttribute(params, 'vcfanno_resources')
Expand Down
12 changes: 12 additions & 0 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -625,6 +625,18 @@
"fa_icon": "fas fa-user-cog",
"description": "Options used to facilitate the annotation of the variants.",
"properties": {
"variant_consequences_snv": {
"type": "string",
"description": "File containing list of SO terms listed in the order of severity from most severe to lease severe for annotating genomic and mitochondrial SNVs.",
"help_text": "For more information check https://grch37.ensembl.org/info/genome/variation/prediction/predicted_data.html",
"fa_icon": "fas fa-file-csv"
},
"variant_consequences_sv": {
"type": "string",
"description": "File containing list of SO terms listed in the order of severity from most severe to lease severe for annotating genomic SVs.",
"help_text": "For more information check https://grch37.ensembl.org/info/genome/variation/prediction/predicted_data.html",
"fa_icon": "fas fa-file-csv"
},
"vep_cache_version": {
"type": "integer",
"default": 110,
Expand Down
21 changes: 12 additions & 9 deletions workflows/raredisease.nf
Original file line number Diff line number Diff line change
Expand Up @@ -40,19 +40,19 @@ if (params.run_rtgvcfeval) {

if (!params.skip_snv_annotation) {
mandatoryParams += ["genome", "vcfanno_resources", "vcfanno_toml", "vep_cache", "vep_cache_version",
"gnomad_af", "score_config_snv"]
"gnomad_af", "score_config_snv", "variant_consequences_snv"]
}

if (!params.skip_sv_annotation) {
mandatoryParams += ["genome", "vep_cache", "vep_cache_version", "score_config_sv"]
mandatoryParams += ["genome", "vep_cache", "vep_cache_version", "score_config_sv", "variant_consequences_sv"]
if (!params.svdb_query_bedpedbs && !params.svdb_query_dbs) {
println("params.svdb_query_bedpedbs or params.svdb_query_dbs should be set.")
missingParamsCount += 1
}
}

if (!params.skip_mt_annotation) {
mandatoryParams += ["genome", "mito_name", "vcfanno_resources", "vcfanno_toml", "vep_cache_version", "vep_cache"]
mandatoryParams += ["genome", "mito_name", "vcfanno_resources", "vcfanno_toml", "vep_cache_version", "vep_cache", "variant_consequences_snv"]
}

if (params.analysis_type.equals("wes")) {
Expand All @@ -72,7 +72,7 @@ if (!params.skip_vep_filter) {
}

if (!params.skip_me_annotation) {
mandatoryParams += ["mobile_element_svdb_annotations"]
mandatoryParams += ["mobile_element_svdb_annotations", "variant_consequences_snv"]
}

for (param in mandatoryParams.unique()) {
Expand Down Expand Up @@ -288,7 +288,10 @@ workflow RAREDISEASE {
ch_target_intervals = ch_references.target_intervals
ch_variant_catalog = params.variant_catalog ? Channel.fromPath(params.variant_catalog).map { it -> [[id:it[0].simpleName],it]}.collect()
: Channel.value([[],[]])
ch_variant_consequences = Channel.fromPath("$projectDir/assets/variant_consequences_v2.txt", checkIfExists: true).collect()
ch_variant_consequences_snv = params.variant_consequences_snv ? Channel.fromPath(params.variant_consequences_snv).collect()
: Channel.value([])
ch_variant_consequences_sv = params.variant_consequences_sv ? Channel.fromPath(params.variant_consequences_sv).collect()
: Channel.value([])
ch_vcfanno_resources = params.vcfanno_resources ? Channel.fromPath(params.vcfanno_resources).splitText().map{it -> it.trim()}.collect()
: Channel.value([])
ch_vcf2cytosure_blacklist = params.vcf2cytosure_blacklist ? Channel.fromPath(params.vcf2cytosure_blacklist).collect()
Expand Down Expand Up @@ -490,7 +493,7 @@ workflow RAREDISEASE {

ANN_CSQ_PLI_SV (
GENERATE_CLINICAL_SET_SV.out.vcf,
ch_variant_consequences
ch_variant_consequences_sv
)
ch_versions = ch_versions.mix(ANN_CSQ_PLI_SV.out.versions)

Expand Down Expand Up @@ -535,7 +538,7 @@ workflow RAREDISEASE {

ANN_CSQ_PLI_SNV (
GENERATE_CLINICAL_SET_SNV.out.vcf,
ch_variant_consequences
ch_variant_consequences_snv
)
ch_versions = ch_versions.mix(ANN_CSQ_PLI_SNV.out.versions)

Expand Down Expand Up @@ -577,7 +580,7 @@ workflow RAREDISEASE {

ANN_CSQ_PLI_MT(
GENERATE_CLINICAL_SET_MT.out.vcf,
ch_variant_consequences
ch_variant_consequences_snv
)
ch_versions = ch_versions.mix(ANN_CSQ_PLI_MT.out.versions)

Expand Down Expand Up @@ -663,7 +666,7 @@ workflow RAREDISEASE {
ch_genome_fasta,
ch_genome_dictionary,
ch_vep_cache,
ch_variant_consequences,
ch_variant_consequences_sv,
ch_vep_filters,
params.genome,
params.vep_cache_version,
Expand Down
Loading