Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ANCOMBC #753

Merged
merged 21 commits into from
Jun 25, 2024
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- [#751](https://github.com/nf-core/ampliseq/pull/751) - Added version R08-RS214 of curated GTDB 16S taxonomy: `sbdi-gtdb=R08-RS214-1` or `sbdi-gtdb` as parameter to `--dada_ref_taxonomy`
- [#752](https://github.com/nf-core/ampliseq/pull/752) - Added version R09-RS220 of GTDB 16S taxonomy: `gtdb=R09-RS220` or `gtdb` as parameter to `--dada_ref_taxonomy`
- [#753](https://github.com/nf-core/ampliseq/pull/753) - ANCOM-BC via QIIME2 can be used with `--ancombc`, `--ancombc_formula`, and `--ancombc_args`

### `Changed`

- [#749](https://github.com/nf-core/ampliseq/pull/749) - Create barplot also when no metadata is given
- [#753](https://github.com/nf-core/ampliseq/pull/753) - ANCOM via QIIME2 is not run anymore by default but on request whith `--ancom`, therefore `--skip_ancom` was removed

### `Fixed`

Expand All @@ -24,6 +26,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Removed`

- [#753](https://github.com/nf-core/ampliseq/pull/753) - `--skip_ancom` was removed

## nf-core/ampliseq version 2.9.0 - 2024-04-03

### `Added`
Expand Down
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,10 @@

> Mandal S, Van Treuren W, White RA, Eggesbø M, Knight R, Peddada SD. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb Ecol Health Dis. 2015 May 29;26:27663. doi: 10.3402/mehd.v26.27663. PMID: 26028277; PMCID: PMC4450248.

- [ANCOM-BC](https://pubmed.ncbi.nlm.nih.gov/32665548/)

> Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction. Nat Commun. 2020 Jul 14;11(1):3514. doi: 10.1038/s41467-020-17041-7. PMID: 32665548; PMCID: PMC7360769.

- [Adonis](https://doi.org/10.1111/j.1442-9993.2001.01070.pp.x) and [VEGAN](https://CRAN.R-project.org/package=vegan)

> Marti J Anderson. A new method for non-parametric multivariate analysis of variance. Austral ecology, 26(1):32–46, 2001.
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@ By default, the pipeline currently performs the following:
- Phylogenetic placement ([EPA-NG](https://github.com/Pbdas/epa-ng))
- Taxonomical classification using DADA2; alternatives are [SINTAX](https://doi.org/10.1101/074161), [Kraken2](https://doi.org/10.1186/s13059-019-1891-0), and [QIIME2](https://www.nature.com/articles/s41587-019-0209-9)
- Excludes unwanted taxa, produces absolute and relative feature/taxa count tables and plots, plots alpha rarefaction curves, computes alpha and beta diversity indices and plots thereof ([QIIME2](https://www.nature.com/articles/s41587-019-0209-9))
- Calls differentially abundant taxa ([ANCOM](https://www.ncbi.nlm.nih.gov/pubmed/26028277))
- Creates phyloseq R objects ([Phyloseq](https://www.bioconductor.org/packages/release/bioc/html/phyloseq.html))
- Pipeline QC summaries ([MultiQC](https://multiqc.info/))
- Pipeline summary report ([R Markdown](https://github.com/rstudio/rmarkdown))
Expand Down Expand Up @@ -74,6 +73,7 @@ nextflow run nf-core/ampliseq \

> [!TIP]
> By default the taxonomic assignment will be performed with DADA2 on SILVA database, but there are various tools and databases readily available, see [taxonomic classification documentation](https://nf-co.re/ampliseq/usage#taxonomic-classification).
> Differential abundance testing with ([ANCOM](https://www.ncbi.nlm.nih.gov/pubmed/26028277)) or ([ANCOM-BC](https://www.ncbi.nlm.nih.gov/pubmed/32665548)) when opting in.

> [!WARNING]
> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_;
Expand Down
57 changes: 54 additions & 3 deletions assets/report_template.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ params:
barplot: FALSE
abundance_tables: FALSE
alpha_rarefaction: FALSE
ancom: FALSE
trunclenf: ""
trunclenr: ""
max_ee: ""
Expand Down Expand Up @@ -102,6 +101,9 @@ params:
diversity_indices_alpha: FALSE
diversity_indices_beta: FALSE
diversity_indices_adonis: ""
ancom: FALSE
ancombc: FALSE
ancombc_formula: FALSE
picrust_pathways: FALSE
sbdi: FALSE
phyloseq: FALSE
Expand Down Expand Up @@ -1531,11 +1533,57 @@ Test results are in separate folders following the scheme `Category-{treatment}-

ancom <- sort( unlist( strsplit( params$ancom,"," ) ) )
for (folder in ancom) {
ancom_path <- paste0("qiime2/ancom/",folder)
ancom_path <- paste0("qiime2/",folder)
cat("\n- [",ancom_path,"/index.html](../",ancom_path,"/index.html)\n", sep="")
}
```

<!-- Subsection on ANCOMBC results -->

```{r, results='asis'}
any_ancombc <- !isFALSE(params$ancombc) || !isFALSE(params$ancombc_formula)
```

```{r, eval = !isFALSE(params$any_ancombc), results='asis'}
cat(paste0("
## ANCOM-BC

[Analysis of Composition of Microbiomes with Bias Correction (ANCOM-BC)](https://www.ncbi.nlm.nih.gov/pubmed/32665548)
is applied to identify features that are differentially
abundant across sample groups.
Comparisons between groups of samples is performed for specific metadata that can be found in folder
"))

if ( !isFALSE(params$ancombc) && !isFALSE(params$ancombc_formula) ) {
cat("[qiime2/ancombc/](../qiime2/ancombc/) and [qiime2/ancombc_formula/](../qiime2/ancombc_formula/)")
} else if ( !isFALSE(params$ancombc) ) {
cat("[qiime2/ancombc/](../qiime2/ancombc/)")
} else if ( !isFALSE(params$ancombc_formula) ) {
cat("[qiime2/ancombc_formula/](../qiime2/ancombc_formula/)")
}
cat(".")

cat(paste0("
Test results are in separate folders following the scheme `Category-{treatment}-{taxonomic level}`:
"))
```

```{r, eval = !isFALSE(params$ancombc), results='asis'}
ancombc <- sort( unlist( strsplit( params$ancombc,"," ) ) )
for (folder in ancombc) {
ancombc_path <- paste0("qiime2/",folder)
cat("\n- [",ancombc_path,"/index.html](../",ancombc_path,"/index.html)\n", sep="")
}
```

```{r, eval = !isFALSE(params$ancombc_formula), results='asis'}
ancombc_formula <- sort( unlist( strsplit( params$ancombc_formula,"," ) ) )
for (folder in ancombc_formula) {
ancombc_formula_path <- paste0("qiime2/",folder)
cat("\n- [",ancombc_formula_path,"/index.html](../",ancombc_formula_path,"/index.html)\n", sep="")
}
```

<!-- Section on PICRUSt2 results -->

```{r, eval = !isFALSE(params$picrust_pathways), results='asis'}
Expand Down Expand Up @@ -1767,7 +1815,7 @@ if ( as.integer(qiime2_filtertaxa_rm) > 0 ) {
}
```
```{r, eval = !isFALSE(params$val_used_taxonomy), results='asis'}
if (!isFALSE(params$barplot) || !isFALSE(params$alpha_rarefaction) || !isFALSE(params$diversity_indices_beta) || !isFALSE(params$ancom)) {
if (!isFALSE(params$barplot) || !isFALSE(params$alpha_rarefaction) || !isFALSE(params$diversity_indices_beta) || !isFALSE(params$ancom) || !isFALSE(any_ancombc)) {
qiime_final <- c("Within QIIME2, the final microbial community data was")
if (!isFALSE(params$barplot)) {
qiime_final <- c(qiime_final,"visualized in a barplot")
Expand All @@ -1782,6 +1830,9 @@ if (!isFALSE(params$barplot) || !isFALSE(params$alpha_rarefaction) || !isFALSE(p
if (!isFALSE(params$ancom)) {
qiime_final <- c(qiime_final,"used to find differential abundant taxa with ANCOM ([Mandal et al., 2015](https://pubmed.ncbi.nlm.nih.gov/26028277/))")
}
if (!isFALSE(any_ancombc)) {
qiime_final <- c(qiime_final,"used to find differential abundant taxa with ANCOM-BC ([Lin and Peddada, 2020](https://pubmed.ncbi.nlm.nih.gov/32665548/))")
}
cat(paste(qiime_final[1],qiime_final[2]))
if (length(qiime_final) >= 3) {
for (x in 3:length(qiime_final)) {
Expand Down
40 changes: 40 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -969,6 +969,46 @@ process {
]
}

withName: 'QIIME2_ANCOMBC_TAX|QIIME2_ANCOMBC_ASV' {
// additional arguments for "qiime composition ancombc", deviating from default: --p-lib-cut (0), --p-conserve (--p-no-conserve)
ext.args = { params.ancombc_args ?: '--p-prv-cut 0.1 --p-lib-cut 500 --p-alpha 0.05 --p-conserve' }
// additional arguments for "qiime composition da-barplot", default is false
ext.args2 = '--p-effect-size-threshold 1 --p-significance-threshold 0.05 --p-label-limit 1000'
publishDir = [
[
path: { "${params.outdir}/qiime2/ancombc" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') || filename.endsWith('.qzv') || filename.endsWith('.qza') ? null : filename }
],
[
path: { "${params.outdir}/qiime2/ancombc/qza_qzv" },
mode: params.publish_dir_mode,
pattern: "*{.qza,.qzv}",
enabled: params.save_intermediates
]
]
}

withName: 'ANCOMBC_FORMULA_TAX|ANCOMBC_FORMULA_ASV' {
// additional arguments for "qiime composition ancombc", deviating from default: --p-lib-cut (0), --p-conserve (--p-no-conserve)
ext.args = { params.ancombc_args ?: '--p-prv-cut 0.1 --p-lib-cut 500 --p-alpha 0.05 --p-conserve' }
// additional arguments for "qiime composition da-barplot", default is false
ext.args2 = '--p-effect-size-threshold 1 --p-significance-threshold 0.05 --p-label-limit 1000'
publishDir = [
[
path: { "${params.outdir}/qiime2/ancombc_formula" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') || filename.endsWith('.qzv') || filename.endsWith('.qza') ? null : filename }
],
[
path: { "${params.outdir}/qiime2/ancombc_formula/qza_qzv" },
mode: params.publish_dir_mode,
pattern: "*{.qza,.qzv}",
enabled: params.save_intermediates
]
]
}

withName: PICRUST {
ext.args = "-t epa-ng --remove_intermediate"
publishDir = [
Expand Down
4 changes: 4 additions & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,8 @@ params {
diversity_rarefaction_depth = 500

vsearch_cluster = true

// Test ANCOMBC
ancombc = true
ancombc_formula = "treatment1"
}
1 change: 1 addition & 0 deletions conf/test_failed.config
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ params {
ignore_failed_trimming = true
ignore_empty_input_files = true
ignore_failed_filtering = true
ancombc = true

//this is to remove low abundance ASVs to reduce runtime of downstream processes
min_samples = 2
Expand Down
4 changes: 4 additions & 0 deletions conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,8 @@ params {

//run adonis
qiime_adonis_formula = "habitat"

//run ANCOM & ANCOMBC
ancom = true
ancombc = true
}
3 changes: 3 additions & 0 deletions conf/test_multiregion.config
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,7 @@ params {
// Reduce runtimes
skip_alpha_rarefaction = true
tax_agglom_max = 3

// Run ANCOM
ancom = true
}
7 changes: 3 additions & 4 deletions conf/test_pplace.config
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,10 @@ params {
min_frequency = 10

// pplace
pplace_tree = "https://github.com/nf-core/test-datasets/raw/phyloplace/testdata/cyanos_16s.newick"
pplace_aln = "https://github.com/nf-core/test-datasets/raw/phyloplace/testdata/cyanos_16s.alnfna"
pplace_tree = params.pipelines_testdata_base_path + "phyloplace/testdata/cyanos_16s.newick"
pplace_aln = params.pipelines_testdata_base_path + "phyloplace/testdata/cyanos_16s.alnfna"
pplace_model = "GTR+F+I+I+R3"
pplace_taxonomy = "https://github.com/nf-core/test-datasets/raw/phyloplace/testdata/cyanos_16s.taxonomy.tsv"
pplace_taxonomy = params.pipelines_testdata_base_path + "phyloplace/testdata/cyanos_16s.taxonomy.tsv"
pplace_name = "test_pplace"

// Adjust taxonomic levels
Expand All @@ -46,5 +46,4 @@ params {
// Skip some steps to reduce runtime
skip_alpha_rarefaction = true
skip_fastqc = true
skip_ancom = true
}
1 change: 1 addition & 0 deletions conf/test_sintax.config
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ params {

//restrict ANCOM analysis to higher taxonomic levels
tax_agglom_max = 4
ancom = true

sbdiexport = true

Expand Down
Binary file modified docs/images/ampliseq_workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions docs/images/ampliseq_workflow.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
30 changes: 27 additions & 3 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Barplot](#barplot) - Interactive barplot
- [Alpha diversity rarefaction curves](#alpha-diversity-rarefaction-curves) - Rarefaction curves for quality control
- [Diversity analysis](#diversity-analysis) - High level overview with different diversity indices
- [ANCOM](#ancom) - Differential abundance analysis
- [Differential abundance analysis](#differential-abundance-analysis) - Calling differentially abundant features with ANCOM or ANCOM-BC
- [PICRUSt2](#picrust2) - Predict the functional potential of a bacterial community
- [SBDI export](#sbdi-export) - Swedish Biodiversity Infrastructure (SBDI) submission file
- [Phyloseq](#phyloseq) - Phyloseq R objects
Expand Down Expand Up @@ -544,11 +544,13 @@ Furthermore, ADONIS permutation-based statistical test in vegan-R determine whet

</details>

#### ANCOM
#### Differential abundance analysis

##### ANCOM

Analysis of Composition of Microbiomes ([ANCOM](https://www.ncbi.nlm.nih.gov/pubmed/26028277)) is applied to identify features that are differentially abundant across sample groups. A key assumption made by ANCOM is that few taxa (less than about 25%) will be differentially abundant between groups otherwise the method will be inaccurate. Parameter `--ancom_sample_min_count` sets the minimum sample counts to retain a sample for ANCOM analysis.

ANCOM is applied to each suitable or specified metadata column for 5 taxonomic levels (2-6).
On request (`--ancom`), ANCOM is applied to each suitable or specified metadata column for 5 taxonomic levels (2-6).

<details markdown="1">
<summary>Output files</summary>
Expand All @@ -560,6 +562,28 @@ ANCOM is applied to each suitable or specified metadata column for 5 taxonomic l

</details>

##### ANCOM-BC

Analysis of Composition of Microbiomes with Bias Correction ([ANCOM-BC](https://www.ncbi.nlm.nih.gov/pubmed/32665548)) is applied to identify features that are differentially abundant across sample groups. Parameter `--ancom_sample_min_count` sets the minimum sample counts to retain a sample for ANCOM-BC analysis.

On request (`--ancombc`), ANCOM-BC is applied to each suitable or specified metadata column for 5 taxonomic levels (2-6). Independently, multiple comma separated formula can be submitted to ANCOM-BC by `--ancombc_formula`. Settings for ANCOM-BC can be modified with `--ancombc_args`.

<details markdown="1">
<summary>Output files</summary>

- `qiime2/ancombc/` or `qiime2/ancombc_formula/`
- `da_barplot/Category-<formula>-<taxonomic level>/`
- `index.html`: Links to interactive plots.
- `<formula><treatment>-ancombc-barplot.html`: Interactive plots.
- `differentials/Category-<formula>-<taxonomic level>/`
- `index.html`: Visualised table of statistical results.
- `*.csv*`: Comma-separated tables of statistical results.
- formula: metadata category / formula that was tested
- taxonomic level: level-2 (phylum), level-3 (class), level-4 (order), level-5 (family), level-6 (genus), ASV
- treatment: Changes for that treatment group

</details>

### PICRUSt2

PICRUSt2 (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) is a software for predicting functional abundances based only on marker gene sequences. On demand (`--picrust`), Enzyme Classification numbers (EC), KEGG orthologs (KO) and MetaCyc ontology predictions will be made for each sample.
Expand Down
5 changes: 5 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
- [Taxonomic classification](#taxonomic-classification)
- [Multiple region analysis with Sidle](#multiple-region-analysis-with-sidle)
- [Metadata](#metadata)
- [Differential abundance analysis](#differential-abundance-analysis)
- [Updating the pipeline](#updating-the-pipeline)
- [Reproducibility](#reproducibility)
- [Core Nextflow arguments](#core-nextflow-arguments)
Expand Down Expand Up @@ -306,6 +307,10 @@ Sample identifiers should be 36 characters long or less, and also contain only A

The columns which are to be assessed can be specified by `--metadata_category`. If `--metadata_category` isn't specified than all columns that fit the specification are automatically chosen.

### Differential abundance analysis

Differential abundance analysis for relative abundance from microbial community analysis are plagued by multiple issues that arent fully solved yet. But some approaches seem promising, for example Analysis of Composition of Microbiomes with Bias Correction ([ANCOM-BC](https://pubmed.ncbi.nlm.nih.gov/32665548/)). [ANCOM](https://pubmed.ncbi.nlm.nih.gov/26028277/) and ANCOM-BC are integrated into the pipeline, but only executed on request via `--ancom` and `--ancombc`, more details in the [nf-core/ampliseq website parameter documentation](https://nf-co.re/ampliseq/parameters/#differential-abundance-analysis).
d4straub marked this conversation as resolved.
Show resolved Hide resolved

### Updating the pipeline

When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline:
Expand Down
Loading
Loading