Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docker for new viz rules #1

Merged
merged 27 commits into from
May 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
e7750ab
fix: create docker for deseq
kelly-sovacool Apr 12, 2024
3a51354
fix: use deseq container
kelly-sovacool Apr 12, 2024
1d69353
fix: biocmanager is on cran
kelly-sovacool Apr 17, 2024
a63ce3f
fix: use one container for all R scripts & R Markdowns
kelly-sovacool Apr 17, 2024
2255b88
chore: use mamba install instead of mamba env install
kelly-sovacool Apr 17, 2024
7ff6150
chore: temporarily remove elbow to solve pkg conflicts
kelly-sovacool Apr 17, 2024
f093244
fix: install ELBOW manually
kelly-sovacool Apr 17, 2024
3593322
chore: fix wget command in docker
kelly-sovacool Apr 17, 2024
10c3fcd
docs: update changelog for 129
kelly-sovacool Apr 17, 2024
7820ee1
chore: Merge branch 'main' into deseq-docker
kelly-sovacool Apr 18, 2024
c47737e
fix: load the tidyverse
kelly-sovacool Apr 18, 2024
f045d50
fix: correct and simplify singularity args
kelly-sovacool Apr 18, 2024
8c16205
fix: need xfun >= 0.43 in deseq rule
kelly-sovacool Apr 18, 2024
3d44d1a
refactor: explicitly set scriptsdir param to default in configfile
kelly-sovacool Apr 18, 2024
3cba50d
fix: use container for deseq rule
kelly-sovacool Apr 18, 2024
e784efe
fix: scripts path for pipeline home
kelly-sovacool Apr 18, 2024
220d9ea
ci: don't run docker auto on PRs
kelly-sovacool Apr 18, 2024
b99882b
feat: add parameter to make go_enrichment optional
kelly-sovacool May 29, 2024
4e2b72c
Merge pull request #129 from CCBR/deseq-docker
kelly-sovacool May 29, 2024
3ad6798
docs: update changelog
kelly-sovacool May 29, 2024
8d9a3be
feat: make the rose rule optional
kelly-sovacool May 29, 2024
06cf017
Merge branch 'main' into corr-docker
kelly-sovacool May 29, 2024
23ef8ce
refactor: use docker container for new R rules
kelly-sovacool May 29, 2024
29a79d5
fix: ComplexHeatmap is on bioc, not cran
kelly-sovacool May 29, 2024
2a635f3
ci: use mamba env create
kelly-sovacool May 29, 2024
106d0c8
Merge pull request #133 from CCBR/refactor-go-enrich
kelly-sovacool May 29, 2024
43265d8
chore: Merge branch 'main' into corr-docker
kelly-sovacool May 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 0 additions & 5 deletions .github/workflows/docker-auto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,6 @@ on:
- main
paths:
- "docker/**"
pull_request:
branches:
- main
paths:
- "docker/**"

jobs:
generate-matrix:
Expand Down
8 changes: 5 additions & 3 deletions .test/config_lint.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,12 @@ samplemanifest: "/opt2/.test/samples.test_lintr.tsv"
# User parameters
#####################################################################################
# run sample contrasts
run_contrasts: "Y" # Y or N
run_contrasts: true
contrasts: "/opt2/.test/contrasts.test.tsv" # run_contrasts needs to be "Y"
contrasts_fdr_cutoff: "0.05"
contrasts_lfc_cutoff: "0.59" # FC of 1.5
contrasts_fdr_cutoff: 0.05
contrasts_lfc_cutoff: 0.59 # FC of 1.5
run_go_enrichment: true
run_rose: true

# reference
genome: "hg38" # currently supports hg38, hg19 and mm10. Custom genome can be added with appropriate additions to "reference" section below.
Expand Down
23 changes: 15 additions & 8 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,20 @@
## CARLISLE development version
- Bug fixes (#127, @epehrsson)
- Removes single-sample group check for DESeq.
- Increases memory for DESeq.
- Ensures control replicate number is an integer.
- Fixes FDR cutoff misassigned to log2FC cutoff.
- Fixes `no_dedup` variable names in library normalization scripts.
- Adds rules cov_correlation, homer_enrich, combine_homer, count_peaks
- Adds peak caller to MACS2 peak xls filename

- Bug fixes: (#127, @epehrsson)
- Removes single-sample group check for DESeq.
- Increases memory for DESeq.
- Ensures control replicate number is an integer.
- Fixes FDR cutoff misassigned to log2FC cutoff.
- Fixes `no_dedup` variable names in library normalization scripts.
- Containerize rules that require R (`deseq`, `go_enrichment`, and `spikein_assessment`) to fix installation issues with common R library path. (#129, @kelly-sovacool)
- The `Rlib_dir` and `Rpkg_config` config options have been removed as they are no longer needed.
- New visualizations: (#132, @epehrsson)
- New rules `cov_correlation`, `homer_enrich`, `combine_homer`, `count_peaks`
- Add peak caller to MACS2 peak xls filename
- New parameters in the config file to make certain rules optional: (#133, @kelly-sovacool)
- GO enrichment is controlled by `run_go_enrichment` (default: `false`)
- ROSE is controlled by `run_rose` (default: `false`)

## CARLISLE v2.5.0
- Refactors R packages to a common source location (#118, @slsevilla)
- Adds a --force flag to allow for re-initialization of a workdir (#97, @slsevilla)
Expand Down
11 changes: 6 additions & 5 deletions carlisle
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,6 @@ tools_specific_yaml="tools_biowulf.yaml"
# these are copied into the WORKDIR
ESSENTIAL_FILES="config/config.yaml config/samples.tsv config/contrasts.tsv config/fqscreen_config.conf config/multiqc_config.yaml config/rpackages.csv"
ESSENTIAL_FOLDERS="workflow/scripts annotation"
# set extra singularity bindings
EXTRA_SINGULARITY_BINDS="-B /data/CCBR_Pipeliner/,/lscratch"

# ## setting PIPELINE_HOME
PIPELINE_HOME=$(readlink -f $(dirname "$0"))
Expand Down Expand Up @@ -144,7 +142,10 @@ function check_essential_files() {
function set_singularity_binds(){
# this functions tries find what folders to bind
# biowulf specific
echo "$PIPELINE_HOME" > ${WORKDIR}/tmp1
# set extra singularity bindings
EXTRA_SINGULARITY_BINDS="/lscratch"

echo "$PIPELINE_HOME" >> ${WORKDIR}/tmp1
echo "$WORKDIR" >> ${WORKDIR}/tmp1
grep -o '\/.*' <(cat ${WORKDIR}/config/config.yaml ${WORKDIR}/config/samples.tsv)|tr '\t' '\n'|grep -v ' \|\/\/'|sort|uniq >> ${WORKDIR}/tmp1
grep gpfs ${WORKDIR}/tmp1|awk -F'/' -v OFS='/' '{print $1,$2,$3,$4,$5}' |sort|uniq > ${WORKDIR}/tmp2
Expand All @@ -153,7 +154,8 @@ function set_singularity_binds(){
binds=$(cat ${WORKDIR}/tmp2 ${WORKDIR}/tmp3 ${WORKDIR}/tmp4|sort|uniq |tr '\n' ',')
rm -f ${WORKDIR}/tmp?
binds=$(echo $binds|awk '{print substr($1,1,length($1)-1)}')
SINGULARITY_BINDS="-B $EXTRA_SINGULARITY_BINDS,$binds"

SINGULARITY_BINDS=" -B $EXTRA_SINGULARITY_BINDS,$binds "
}

function rescript(){
Expand All @@ -168,7 +170,6 @@ function runcheck(){
check_essential_files
module load $PYTHON_VERSION
module load $SNAKEMAKE_VERSION
# SINGULARITY_BINDS="$EXTRA_SINGULARITY_BINDS -B ${PIPELINE_HOME}:${PIPELINE_HOME} -B ${WORKDIR}:${WORKDIR}"
}

function controlcheck(){
Expand Down
25 changes: 18 additions & 7 deletions config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,27 @@
# The working dir... output will be in the results subfolder of the workdir
workdir: "WORKDIR"

# scripts directory
# by default, use the scripts copied to the working directory.
# alternatively, use the scripts from the pipeline source.
scriptsdir: "WORKDIR/scripts"
#scriptsdir: "PIPELINE_HOME/workflow/scripts"

# tab delimited samples file .. see samplefile for format details
samplemanifest: "WORKDIR/config/samples.tsv"

#####################################################################################
# User parameters
#####################################################################################
# run sample contrasts
run_contrasts: "Y" # Y or N
contrasts: "WORKDIR/config/contrasts.tsv" # run_contrasts needs to be "Y"
contrasts_fdr_cutoff: "0.05"
contrasts_lfc_cutoff: "0.59" # FC of 1.5
run_contrasts: true # true or false, no quotes
contrasts: "WORKDIR/config/contrasts.tsv" # run_contrasts needs to be `true`
contrasts_fdr_cutoff: 0.05
contrasts_lfc_cutoff: 0.59 # FC of 1.5

# these steps are long-running. use `true` if you would like to run them
run_go_enrichment: false
run_rose: false

# reference
genome: "hg38" # currently supports hg38, hg19 and mm10. Custom genome can be added with appropriate additions to "reference" section below.
Expand Down Expand Up @@ -150,7 +160,8 @@ spikein_reference:
adapters: "PIPELINE_HOME/resources/other/adapters.fa"

#####################################################################################
# R Packages
# CONTAINERS
#####################################################################################
Rlib_dir: "/data/CCBR_Pipeliner/db/PipeDB/Rlibrary_4.3_carlisle/"
Rpkg_config: "WORKDIR/config/rpackages.csv"
containers:
base: "docker://nciccbr/ccbr_ubuntu_base_20.04:v6"
carlisle_r: "docker://nciccbr/carlisle_r:v2"
33 changes: 0 additions & 33 deletions config/rpackages.csv

This file was deleted.

30 changes: 30 additions & 0 deletions docker/carlisle_r/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
FROM nciccbr/ccbr_ubuntu_base_20.04:v6

# build time variables
ARG BUILD_DATE="000000"
ENV BUILD_DATE=${BUILD_DATE}
ARG BUILD_TAG="000000"
ENV BUILD_TAG=${BUILD_TAG}
ARG REPONAME="000000"
ENV REPONAME=${REPONAME}

# install conda packages
COPY environment.yml /data2/
ENV CONDA_ENV=carlisle
RUN mamba env create -n ${CONDA_ENV} -f /data2/environment.yml && \
echo "conda activate ${CONDA_ENV}" > ~/.bashrc
ENV PATH="/opt2/conda/envs/${CONDA_ENV}/bin:$PATH"
ENV R_LIBS_USER=/opt2/conda/lib/R/library/

# install ELBOW manually, fails with mamba
RUN wget --no-check-certificate https://bioconductor.riken.jp/packages/3.4/bioc/src/contrib/ELBOW_1.10.0.tar.gz && \
R -e 'install.packages("ELBOW_1.10.0.tar.gz", repos = NULL, type="source", INSTALL_opts = "--no-lock")'

# Save Dockerfile in the docker
COPY Dockerfile /opt2/Dockerfile_${REPONAME}.${BUILD_TAG}
RUN chmod a+r /opt2/Dockerfile_${REPONAME}.${BUILD_TAG}

# cleanup
WORKDIR /data2
RUN apt-get clean && apt-get purge \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
38 changes: 38 additions & 0 deletions docker/carlisle_r/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
channels:
- bioconda
- conda-forge
- r
dependencies:
- bioconductor-bsgenome.hsapiens.ncbi.t2t.chm13v2.0
- bioconductor-chipenrich
- bioconductor-chipseeker
- bioconductor-ComplexHeatmap
- bioconductor-deseq2
- bioconductor-edger
- bioconductor-enhancedvolcano
- bioconductor-genomicfeatures
- bioconductor-htsfilter
- bioconductor-org.Hs.eg.db
- bioconductor-org.Mm.eg.db
- bioconductor-rtracklayer
- bioconductor-txdb.hsapiens.ucsc.hg19.knowngene
- bioconductor-TxDb.Hsapiens.UCSC.hg38.knownGene
- bioconductor-TxDb.Mmusculus.UCSC.mm10.knownGene
- deeptools
- r-argparse
- r-circlize
- r-DT
- r-ggfortify
- r-ggvenn
- r-htmltools
- r-latticeextra
- r-openxlsx
- r-pander
- r-pdp
- r-plotly
- r-plyr
- r-rcolorbrewer
- r-reshape2
- r-tidyverse
- r-xfun>=0.43
- r-yaml
4 changes: 4 additions & 0 deletions docker/carlisle_r/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
dockerhub_namespace: nciccbr
image_name: carlisle_r
version: v2
container: "$(dockerhub_namespace)/$(image_name):$(version)"
4 changes: 2 additions & 2 deletions docs/user-guide/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ The following directories are created under the WORKDIR/results directory:
- contrasts: this directory includes the contrasts for each line listed in the contrast manifest
- peak_caller: this directory includes all peak calls from each peak_caller (SEACR, MACS2, GOPEAKS) for each sample
- annotation
- go_enrichment: this directory includes gene set enrichment pathway predictions
- go_enrichment: this directory includes gene set enrichment pathway predictions when `run_go_enrichment` is set to `true` in the config file.
- homer: this directory includes the annotation output from HOMER
- rose: this directory includes the annotation output from ROSE
- rose: this directory includes the annotation output from ROSE when `run_rose` is set to `true` in the config file.
- qc: this directory includes MULTIQC reports and spike-in control reports (when applicable)

```
Expand Down
4 changes: 2 additions & 2 deletions docs/user-guide/preparing-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ The pipeline allows for the use of a species specific spike-in control, or the u

For example for ecoli spike-in:
```
run_contrasts: "Y"
run_contrasts: true
norm_method: "spikein"
spikein_genome: "ecoli"
spikein_reference:
Expand All @@ -41,7 +41,7 @@ spikein_reference:

For example for drosophila spike-in:
```
run_contrasts: "Y"
run_contrasts: true
norm_method: "spikein"
spikein_genome: "drosophila"
spikein_reference:
Expand Down
23 changes: 12 additions & 11 deletions workflow/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ def run_qc(wildcards):

def run_contrasts(wildcards):
files=[]
if config["run_contrasts"] == "Y":
if config["run_contrasts"]:
files.append(join(RESULTSDIR,"replicate_sample.tsv"))

# inputs for matrix
Expand Down Expand Up @@ -166,20 +166,21 @@ def get_combined(wildcards):

def get_rose(wildcards):
files=[]
if ("macs2_narrow" in PEAKTYPE) or ("macs2_broad" in PEAKTYPE):
anno_m=expand(join(RESULTSDIR,"peaks","{qthresholds}","{peak_caller}","annotation","rose","{treatment_control_list}.{dupstatus}.{peak_caller_type}.{s_dist}","{treatment_control_list}_AllStitched.table.super.summits.bed"),peak_caller="macs2",qthresholds=QTRESHOLDS,treatment_control_list=TREATMENT_LIST_M,dupstatus=DUPSTATUS,peak_caller_type=PEAKTYPE_M,s_dist=S_DISTANCE),
files.extend(anno_m)
if ("gopeaks_narrow" in PEAKTYPE) or ("gopeaks_broad" in PEAKTYPE):
anno_g=expand(join(RESULTSDIR,"peaks","{qthresholds}","{peak_caller}","annotation","rose","{treatment_control_list}.{dupstatus}.{peak_caller_type}.{s_dist}","{treatment_control_list}_AllStitched.table.super.summits.bed"),peak_caller="gopeaks",qthresholds=QTRESHOLDS,treatment_control_list=TREATMENT_LIST_SG,dupstatus=DUPSTATUS,peak_caller_type=PEAKTYPE_G,s_dist=S_DISTANCE),
files.extend(anno_g)
if ("seacr_stringent" in PEAKTYPE) or ("seacr_relaxed" in PEAKTYPE):
anno_s=expand(join(RESULTSDIR,"peaks","{qthresholds}","{peak_caller}","annotation","rose","{treatment_control_list}.{dupstatus}.{peak_caller_type}.{s_dist}","{treatment_control_list}_AllStitched.table.super.summits.bed"),peak_caller="seacr",qthresholds=QTRESHOLDS,treatment_control_list=TREATMENT_LIST_SG,dupstatus=DUPSTATUS,peak_caller_type=PEAKTYPE_S,s_dist=S_DISTANCE),
files.extend(anno_s)
if config['run_rose']:
if ("macs2_narrow" in PEAKTYPE) or ("macs2_broad" in PEAKTYPE):
anno_m=expand(join(RESULTSDIR,"peaks","{qthresholds}","{peak_caller}","annotation","rose","{treatment_control_list}.{dupstatus}.{peak_caller_type}.{s_dist}","{treatment_control_list}_AllStitched.table.super.summits.bed"),peak_caller="macs2",qthresholds=QTRESHOLDS,treatment_control_list=TREATMENT_LIST_M,dupstatus=DUPSTATUS,peak_caller_type=PEAKTYPE_M,s_dist=S_DISTANCE),
files.extend(anno_m)
if ("gopeaks_narrow" in PEAKTYPE) or ("gopeaks_broad" in PEAKTYPE):
anno_g=expand(join(RESULTSDIR,"peaks","{qthresholds}","{peak_caller}","annotation","rose","{treatment_control_list}.{dupstatus}.{peak_caller_type}.{s_dist}","{treatment_control_list}_AllStitched.table.super.summits.bed"),peak_caller="gopeaks",qthresholds=QTRESHOLDS,treatment_control_list=TREATMENT_LIST_SG,dupstatus=DUPSTATUS,peak_caller_type=PEAKTYPE_G,s_dist=S_DISTANCE),
files.extend(anno_g)
if ("seacr_stringent" in PEAKTYPE) or ("seacr_relaxed" in PEAKTYPE):
anno_s=expand(join(RESULTSDIR,"peaks","{qthresholds}","{peak_caller}","annotation","rose","{treatment_control_list}.{dupstatus}.{peak_caller_type}.{s_dist}","{treatment_control_list}_AllStitched.table.super.summits.bed"),peak_caller="seacr",qthresholds=QTRESHOLDS,treatment_control_list=TREATMENT_LIST_SG,dupstatus=DUPSTATUS,peak_caller_type=PEAKTYPE_S,s_dist=S_DISTANCE),
files.extend(anno_s)
return files

def get_enrichment(wildcards):
files=[]
if config["run_contrasts"] == "Y":
if config["run_contrasts"] and config['run_go_enrichment']:
if (GENOME == "hg19") or (GENOME == "hg38"):
if ("macs2_narrow" in PEAKTYPE) or ("macs2_broad" in PEAKTYPE):
t=expand(join(RESULTSDIR,"peaks","{qthresholds}","{peak_caller}","annotation","go_enrichment","{contrast_list}.{dupstatus}.txt"),peak_caller="macs2",qthresholds=QTRESHOLDS,contrast_list=CONTRAST_LIST,dupstatus=DUPSTATUS)
Expand Down
4 changes: 1 addition & 3 deletions workflow/rules/align.smk
Original file line number Diff line number Diff line change
Expand Up @@ -487,9 +487,7 @@ rule cov_correlation:
params:
rscript=join(SCRIPTSDIR,"_plot_correlation.R"),
dupstatus="{dupstatus}"
envmodules:
TOOLS["deeptools"],
TOOLS["R"]
container: config['containers']['carlisle_r']
threads: getthreads("cov_correlation")
shell:
"""
Expand Down
15 changes: 4 additions & 11 deletions workflow/rules/annotations.smk
Original file line number Diff line number Diff line change
Expand Up @@ -73,8 +73,7 @@ rule homer_enrich:
peak_mode="{peak_caller_type}",
dupstatus="{dupstatus}",
rscript=join(SCRIPTSDIR,"_plot_feature_enrichment.R")
envmodules:
TOOLS["R"]
container: config['containers']['carlisle_r']
shell:
"""
Rscript {params.rscript} {params.annotation_dir} {params.peak_mode} {params.dupstatus} {output.enrich_png}
Expand All @@ -89,8 +88,7 @@ rule combine_homer:
xls_file = join(RESULTSDIR,"peaks","{qthresholds}","macs2","peak_output","{treatment_control_list}.{dupstatus}.{peak_caller_type}.peaks.xls")
output:
combined=join(RESULTSDIR,"peaks","{qthresholds}","macs2","annotation","homer","{treatment_control_list}.{dupstatus}.{peak_caller_type}.annotation_qvalue.xlsx")
envmodules:
TOOLS["R"]
container: config['containers']['carlisle_r']
params:
rscript=join(SCRIPTSDIR,"_combine_macs2_homer.R")
shell:
Expand Down Expand Up @@ -294,7 +292,7 @@ rule rose:
echo "Less than 5 usable peaks detected (N=${{num_of_peaks}})" > {output.super_summit}
fi
"""
if config["run_contrasts"] == "Y":
if config["run_contrasts"]:
rule create_contrast_peakcaller_files:
"""
Reads in all of the output from Rules create_contrast_data_files which match the same peaktype and merges them together
Expand Down Expand Up @@ -324,18 +322,15 @@ if config["run_contrasts"] == "Y":
rscript_wrapper=join(SCRIPTSDIR,"_go_enrichment_wrapper.R"),
rmd=join(SCRIPTSDIR,"_go_enrichment.Rmd"),
carlisle_functions=join(SCRIPTSDIR,"_carlisle_functions.R"),
Rlib_dir=config["Rlib_dir"],
Rpkg_config=config["Rpkg_config"],
rscript_diff=join(SCRIPTSDIR,"_diff_markdown_wrapper.R"),
rscript_functions=join(SCRIPTSDIR,"_carlisle_functions.R"),
output_dir = join(RESULTSDIR,"peaks","{qthresholds}","{peak_caller}","annotation","go_enrichment"),
species = config["genome"],
geneset_id = GENESET_ID,
dedup_status = "{dupstatus}"
envmodules:
TOOLS["R"],
output:
html=join(RESULTSDIR,"peaks","{qthresholds}","{peak_caller}","annotation","go_enrichment","{contrast_list}.{dupstatus}.go_enrichment.html"),
container: config['containers']['carlisle_r']
shell:
"""
set -exo pipefail
Expand All @@ -348,8 +343,6 @@ if config["run_contrasts"] == "Y":
Rscript {params.rscript_wrapper} \\
--rmd {params.rmd} \\
--carlisle_functions {params.carlisle_functions} \\
--Rlib_dir {params.Rlib_dir} \\
--Rpkg_config {params.Rpkg_config} \\
--output_dir {params.output_dir} \\
--report {output.html} \\
--peak_list "$clean_sample_list" \\
Expand Down
Loading