Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new rules visualizing CARLISLE results #132

Merged
merged 23 commits into from
Jun 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
5fcfbaa
Removed temp output
May 15, 2024
23a8b0e
Created rules cov_correlation, count_peaks
May 20, 2024
89ca040
Added rule homer_enrich to plot enrichment over features
May 21, 2024
fc456af
Added rule combine_homer, requiring change to MACS2 peak xls name
May 22, 2024
b9bcf72
Added rule combine_homer, requiring change to MACS2 peak xls name
May 22, 2024
fa4fdaa
Changed script names, updated changelog
May 22, 2024
4dffd29
Fixed bug assigning colors to many samples
May 23, 2024
e5a33ed
Resolve merge commits
May 23, 2024
e751bc7
Reverted temp output
May 23, 2024
06cf017
Merge branch 'main' into corr-docker
kelly-sovacool May 29, 2024
23ef8ce
refactor: use docker container for new R rules
kelly-sovacool May 29, 2024
29a79d5
fix: ComplexHeatmap is on bioc, not cran
kelly-sovacool May 29, 2024
2a635f3
ci: use mamba env create
kelly-sovacool May 29, 2024
43265d8
chore: Merge branch 'main' into corr-docker
kelly-sovacool May 29, 2024
7b4e33e
Merge pull request #1 from CCBR/corr-docker
epehrsson May 30, 2024
63cd62f
Update workflow/rules/annotations.smk
epehrsson May 31, 2024
c0c15b0
chore: Merge branch 'main' into corr-merge
kelly-sovacool May 31, 2024
0ad396b
Merge pull request #2 from CCBR/corr-merge
epehrsson May 31, 2024
d747b54
fix: ignore "#" comment lines in peaks files
kelly-sovacool May 31, 2024
89aed8b
fix: only load needed packages
kelly-sovacool Jun 7, 2024
338b23c
feat: write both tsv & excel format
kelly-sovacool Jun 10, 2024
48431e6
Merge pull request #3 from CCBR/fix-combine-homer
epehrsson Jun 11, 2024
f6f93e1
Added Excel back to combine_homer output
Jun 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,19 @@
## CARLISLE development version

- Bug fixes (#127, @epehrsson)
- Bug fixes: (#127, @epehrsson)
- Removes single-sample group check for DESeq.
- Increases memory for DESeq.
- Ensures control replicate number is an integer.
- Fixes FDR cutoff misassigned to log2FC cutoff.
- Fixes `no_dedup` variable names in library normalization scripts.
- Containerize rules that require R (`deseq`, `go_enrichment`, and `spikein_assessment`) to fix installation issues with common R library path. (#129, @kelly-sovacool)
- The `Rlib_dir` and `Rpkg_config` config options have been removed as they are no longer needed.
- New visualizations: (#132, @epehrsson)
- New rules `cov_correlation`, `homer_enrich`, `combine_homer`, `count_peaks`
- Add peak caller to MACS2 peak xls filename
- New parameters in the config file to make certain rules optional: (#133, @kelly-sovacool)
- GO enrichment is controlled by `run_go_enrichment` (default: `false`)
- rose is controlled by `run_rose` (default: `false`)
- ROSE is controlled by `run_rose` (default: `false`)
- Fig bug that added nonexistent directories to the singularity bind paths. (#135, @kelly-sovacool)

## CARLISLE v2.5.0
Expand Down
2 changes: 1 addition & 1 deletion config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -164,4 +164,4 @@ adapters: "PIPELINE_HOME/resources/other/adapters.fa"
#####################################################################################
containers:
base: "docker://nciccbr/ccbr_ubuntu_base_20.04:v6"
carlisle_r: "docker://nciccbr/carlisle_r:v1"
carlisle_r: "docker://nciccbr/carlisle_r:v2"
12 changes: 6 additions & 6 deletions docker/carlisle_r/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@ ARG REPONAME="000000"
ENV REPONAME=${REPONAME}

# install conda packages
COPY packages.txt /data2/
RUN mamba install \
--no-channel-priority \
-c bioconda -c conda-forge -c r \
--file /data2/packages.txt
ENV PATH="/opt2/conda/bin:$PATH"
COPY environment.yml /data2/
ENV CONDA_ENV=carlisle
RUN mamba env create -n ${CONDA_ENV} -f /data2/environment.yml && \
echo "conda activate ${CONDA_ENV}" > ~/.bashrc
ENV PATH="/opt2/conda/envs/${CONDA_ENV}/bin:$PATH"
ENV R_LIBS_USER=/opt2/conda/lib/R/library/

# install ELBOW manually, fails with mamba
RUN wget --no-check-certificate https://bioconductor.riken.jp/packages/3.4/bioc/src/contrib/ELBOW_1.10.0.tar.gz && \
R -e 'install.packages("ELBOW_1.10.0.tar.gz", repos = NULL, type="source", INSTALL_opts = "--no-lock")'
Expand Down
38 changes: 38 additions & 0 deletions docker/carlisle_r/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
channels:
- bioconda
- conda-forge
- r
dependencies:
- bioconductor-bsgenome.hsapiens.ncbi.t2t.chm13v2.0
- bioconductor-chipenrich
- bioconductor-chipseeker
- bioconductor-ComplexHeatmap
- bioconductor-deseq2
- bioconductor-edger
- bioconductor-enhancedvolcano
- bioconductor-genomicfeatures
- bioconductor-htsfilter
- bioconductor-org.Hs.eg.db
- bioconductor-org.Mm.eg.db
- bioconductor-rtracklayer
- bioconductor-txdb.hsapiens.ucsc.hg19.knowngene
- bioconductor-TxDb.Hsapiens.UCSC.hg38.knownGene
- bioconductor-TxDb.Mmusculus.UCSC.mm10.knownGene
- deeptools
- r-argparse
- r-circlize
- r-DT
- r-ggfortify
- r-ggvenn
- r-htmltools
- r-latticeextra
- r-openxlsx
- r-pander
- r-pdp
- r-plotly
- r-plyr
- r-rcolorbrewer
- r-reshape2
- r-tidyverse
- r-xfun>=0.43
- r-yaml
2 changes: 1 addition & 1 deletion docker/carlisle_r/meta.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
dockerhub_namespace: nciccbr
image_name: carlisle_r
version: v1
version: v2
container: "$(dockerhub_namespace)/$(image_name):$(version)"
28 changes: 0 additions & 28 deletions docker/carlisle_r/packages.txt

This file was deleted.

2 changes: 2 additions & 0 deletions resources/cluster_biowulf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -78,3 +78,5 @@ DESeq:
time: 00-02:00:00
threads: 2
###################################################################
cov_correlation:
threads: 4
32 changes: 31 additions & 1 deletion workflow/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,26 @@ def get_motifs(wildcards):
files.extend(anno_s)
return files

def get_enrich(wildcards):
files=[]
if ("macs2_narrow" in PEAKTYPE) or ("macs2_broad" in PEAKTYPE):
anno_m=expand(join(RESULTSDIR,"peaks","{qthresholds}","{peak_caller}","annotation","homer","enrichment.{dupstatus}.{peak_caller_type}.png"),peak_caller="macs2",qthresholds=QTRESHOLDS,dupstatus=DUPSTATUS,peak_caller_type=PEAKTYPE_M),
files.extend(anno_m)
if ("gopeaks_narrow" in PEAKTYPE) or ("gopeaks_broad" in PEAKTYPE):
anno_g=expand(join(RESULTSDIR,"peaks","{qthresholds}","{peak_caller}","annotation","homer","enrichment.{dupstatus}.{peak_caller_type}.png"),peak_caller="gopeaks",qthresholds=QTRESHOLDS,dupstatus=DUPSTATUS,peak_caller_type=PEAKTYPE_G),
files.extend(anno_g)
if ("seacr_stringent" in PEAKTYPE) or ("seacr_relaxed" in PEAKTYPE):
anno_s=expand(join(RESULTSDIR,"peaks","{qthresholds}","{peak_caller}","annotation","homer","enrichment.{dupstatus}.{peak_caller_type}.png"),peak_caller="seacr",qthresholds=QTRESHOLDS,dupstatus=DUPSTATUS,peak_caller_type=PEAKTYPE_S),
files.extend(anno_s)
return files

def get_combined(wildcards):
files = []
if ("macs2_narrow" in PEAKTYPE) or ("macs2_broad" in PEAKTYPE):
combined_m = expand(join(RESULTSDIR,"peaks","{qthresholds}","macs2","annotation","homer","{treatment_control_list}.{dupstatus}.{peak_caller_type}.annotation_qvalue.tsv"),qthresholds=QTRESHOLDS,dupstatus=DUPSTATUS,peak_caller_type=PEAKTYPE_M,treatment_control_list=TREATMENT_LIST_M)
files.extend(combined_m)
return files

def get_rose(wildcards):
files=[]
if config['run_rose']:
Expand Down Expand Up @@ -200,6 +220,12 @@ rule all:
# ALIGN / deeptools_heatmap
unpack(run_deeptools_heatmap),

# ALIGN / deeptools_corr
expand(join(RESULTSDIR,"deeptools","all.{dupstatus}.PearsonCorr.tab"),dupstatus=DUPSTATUS),
expand(join(RESULTSDIR,"deeptools","all.{dupstatus}.PCA.tab"),dupstatus=DUPSTATUS),
expand(join(RESULTSDIR,"deeptools","all.{dupstatus}.Pearson_heatmap.png"),dupstatus=DUPSTATUS),
expand(join(RESULTSDIR,"deeptools","all.{dupstatus}.PearsonPCA.png"),dupstatus=DUPSTATUS),

# ALIGN / create_library_norm_scales
unpack(run_library_norm),

Expand All @@ -213,6 +239,8 @@ rule all:
unpack(run_macs2),
unpack(run_seacr),
unpack(run_gopeaks),
join(RESULTSDIR,"peaks","Peak counts.xlsx"),
join(RESULTSDIR,"peaks","all.peaks.txt"),

# QC
unpack(run_qc),
Expand All @@ -222,6 +250,8 @@ rule all:

# ANNOTATION / findMotif, rose, create_contrast_peakcaller_files, go_enrichment
unpack(get_motifs),
unpack(get_enrich),
unpack(get_combined),
unpack(get_rose),
unpack(get_enrichment)

Expand Down Expand Up @@ -292,4 +322,4 @@ onerror:
expand(join(RESULTSDIR,"deeptools","clean", "{group}.{dupstatus}.metagene.mat.gz"),group=[a+b for a in TREATMENTS+["all_samples"] for b in ["", ".prot"]],dupstatus=DUPSTATUS),
expand(join(RESULTSDIR,"deeptools","clean", "{group}.{dupstatus}.TSS.mat.gz"),group=[a+b for a in TREATMENTS+["all_samples"] for b in ["", ".prot"]],dupstatus=DUPSTATUS),
expand(join(RESULTSDIR,"deeptools","clean", "{group}.{dupstatus}.geneinfo.bed"),group=[a+b for a in TREATMENTS+["all_samples"] for b in ["", ".prot"]],dupstatus=DUPSTATUS),
"""
"""
46 changes: 45 additions & 1 deletion workflow/rules/align.smk
Original file line number Diff line number Diff line change
Expand Up @@ -468,4 +468,48 @@ rule deeptools_heatmap:
plotHeatmap -m {input.TSSmat} -out {output.TSSheat} --colorMap 'RdBu_r' --zMin auto --zMax auto --yAxisLabel 'average RPGC' --regionsLabel 'genes' --legendLocation 'none'
plotProfile -m {input.metamat} -out {output.metaline} --plotHeight 15 --plotWidth 15 --perGroup --yAxisLabel 'average RPGC' --plotType 'se' --legendLocation upper-right
plotProfile -m {input.TSSmat} -out {output.TSSline} --plotHeight 15 --plotWidth 15 --perGroup --yAxisLabel 'average RPGC' --plotType 'se' --legendLocation upper-left
"""
"""

rule cov_correlation:
"""
Create replicate correlation plots from filtered BAM files
"""
input:
bams=expand(join(RESULTSDIR,"bam","{replicate}.{{dupstatus}}.bam"),replicate=REPLICATES),
align_table=join(RESULTSDIR,"alignment_stats","alignment_stats.tsv")
output:
counts=join(RESULTSDIR,"deeptools","all.{dupstatus}.readCounts.npz"),
pearson_corr=join(RESULTSDIR,"deeptools","all.{dupstatus}.PearsonCorr.tab"),
pearson_plot=join(RESULTSDIR,"deeptools","all.{dupstatus}.PearsonCorr.png"),
pca=join(RESULTSDIR,"deeptools","all.{dupstatus}.PCA.tab"),
hc=join(RESULTSDIR,"deeptools","all.{dupstatus}.Pearson_heatmap.png"),
pca_format=join(RESULTSDIR,"deeptools","all.{dupstatus}.PearsonPCA.png")
params:
rscript=join(SCRIPTSDIR,"_plot_correlation.R"),
dupstatus="{dupstatus}"
container: config['containers']['carlisle_r']
threads: getthreads("cov_correlation")
shell:
"""
# Calculate genome-wide coverage
multiBamSummary bins \
--bamfiles {input.bams} \
--smartLabels \
-out {output.counts} \
-p {threads}

# Plot heatmap - Pearson
plotCorrelation \
-in {output.counts} \
--corMethod pearson --skipZeros \
--whatToPlot heatmap --plotNumbers \
-o {output.pearson_plot} \
--removeOutliers \
--outFileCorMatrix {output.pearson_corr}

# Plot PCA
plotPCA -in {output.counts} --outFileNameData {output.pca}

# Plot heatmap and PCA (formatted)
Rscript {params.rscript} {output.pearson_corr} {output.pca} {input.align_table} {params.dupstatus} {output.hc} {output.pca_format}
"""
Loading