Skip to content

Commit

Permalink
Merge pull request #351 from ENCODE-DCC/dev
Browse files Browse the repository at this point in the history
v2.0.2
  • Loading branch information
leepc12 authored Nov 16, 2021
2 parents 4f8e9bd + 219f033 commit cd1508c
Show file tree
Hide file tree
Showing 2 changed files with 44 additions and 27 deletions.
49 changes: 33 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,14 @@
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.156534.svg)](https://doi.org/10.5281/zenodo.156534)[![CircleCI](https://circleci.com/gh/ENCODE-DCC/atac-seq-pipeline/tree/master.svg?style=svg)](https://circleci.com/gh/ENCODE-DCC/atac-seq-pipeline/tree/master)


## Download new Caper>=2.0
## Download new Caper>=2.1

New Caper is out. You need to update your Caper to work with the latest ENCODE ATAC-seq pipeline.
```bash
$ pip install caper --upgrade
```

## Local/HPC users and new Caper>=2.0
## Local/HPC users and new Caper>=2.1

There are tons of changes for local/HPC backends: `local`, `slurm`, `sge`, `pbs` and `lsf`(added). Make a backup of your current Caper configuration file `~/.caper/default.conf` and run `caper init`. Local/HPC users need to reset/initialize Caper's configuration file according to your chosen backend. Edit the configuration file and follow instructions in there.
```bash
Expand Down Expand Up @@ -75,9 +75,20 @@ The ATAC-seq pipeline protocol specification is [here](https://docs.google.com/d
$ bash scripts/install_conda_env.sh
```

## Test run

You can use URIs(`s3://`, `gs://` and `http(s)://`) in Caper's command lines and input JSON file then Caper will automatically download/localize such files. Input JSON file URL: https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled.json
## Input JSON file specification

> **IMPORTANT**: DO NOT BLINDLY USE A TEMPLATE/EXAMPLE INPUT JSON. READ THROUGH THE FOLLOWING GUIDE TO MAKE A CORRECT INPUT JSON FILE. ESPECIALLY FOR AUTODETECTING/DEFINING ADAPTERS.
An input JSON file specifies all the input parameters and files that are necessary for successfully running this pipeline. This includes a specification of the path to the genome reference files and the raw data fastq file. Please make sure to specify absolute paths rather than relative paths in your input JSON files.

1) [Input JSON file specification (short)](docs/input_short.md)
2) [Input JSON file specification (long)](docs/input.md)


## Running on local computer/HPCs

You can use URIs(`s3://`, `gs://` and `http(s)://`) in Caper's command lines and input JSON file then Caper will automatically download/localize such files. Input JSON file example: https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled.json

According to your chosen platform of Caper, run Caper or submit Caper command line to the cluster. You can choose other environments like `--singularity` or `--docker` instead of `--conda`. But you must define one of the environments.

Expand All @@ -89,6 +100,12 @@ The followings are just examples. Please read [Caper's README](https://github.co
# Or submit it as a leader job (with long/enough resources) to SLURM (Stanford Sherlock) with Singularity
# It will fail if you directly run the leader job on login nodes
$ sbatch -p [SLURM_PARTITON] -J [WORKFLOW_NAME] --export=ALL --mem 4G -t 4-0 --wrap "caper run atac.wdl -i https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled.json --singularity"

# Check status of your leader job
$ squeue -u $USER | grep [WORKFLOW_NAME]

# Cancel the leader node to close all of its children jobs
$ scancel -j [JOB_ID]
```


Expand All @@ -99,7 +116,7 @@ You can run this pipeline on [truwl.com](https://truwl.com/). This provides a we
If you do not run the pipeline on Truwl, you can still share your use-case/job on the platform by getting in touch at [[email protected]](mailto:[email protected]) and providing your inputs.json file.


## Running a pipeline on Terra/Anvil (using Dockstore)
## Running on Terra/Anvil (using Dockstore)

Visit our pipeline repo on [Dockstore](https://dockstore.org/workflows/github.com/ENCODE-DCC/atac-seq-pipeline). Click on `Terra` or `Anvil`. Follow Terra's instruction to create a workspace on Terra and add Terra's billing bot to your Google Cloud account.

Expand All @@ -108,30 +125,30 @@ Download this [test input JSON for Terra](https://storage.googleapis.com/encode-
If you want to use your own input JSON file, then make sure that all files in the input JSON are on a Google Cloud Storage bucket (`gs://`). URLs will not work.


## Running a pipeline on DNAnexus (using Dockstore)
## Running on DNAnexus (using Dockstore)

Sign up for a new account on [DNAnexus](https://platform.dnanexus.com/) and create a new project on either AWS or Azure. Visit our pipeline repo on [Dockstore](https://dockstore.org/workflows/github.com/ENCODE-DCC/atac-seq-pipeline). Click on `DNAnexus`. Choose a destination directory on your DNAnexus project. Click on `Submit` and visit DNAnexus. This will submit a conversion job so that you can check status of it on `Monitor` on DNAnexus UI.

Once conversion is done download one of the following input JSON files according to your chosen platform (AWS or Azure) for your DNAnexus project:
- AWS: https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled_dx.json
- Azure: https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled_dx_azure.json

You cannot use these input JSON files directly. Go to the destination directory on DNAnexus and click on the converted workflow `atac`. You will see input file boxes in the left-hand side of the task graph. Expand it and define FASTQs (`fastq_repX_R1`) and `genome_tsv` as in the downloaded input JSON file. Click on the `common` task box and define other non-file pipeline parameters.

You cannot use these input JSON files directly. Go to the destination directory on DNAnexus and click on the converted workflow `atac`. You will see input file boxes in the left-hand side of the task graph. Expand it and define FASTQs (`fastq_repX_R1` and also `fastq_repX_R2` if it's paired-ended) and `genome_tsv` as in the downloaded input JSON file. Click on the `common` task box and define other non-file pipeline parameters. e.g. `auto_detect_adapters` and `paired_end`.

## Running a pipeline on DNAnexus (using our pre-built workflows)

See [this](docs/tutorial_dx_web.md) for details.
We have a separate project on DNANexus to provide example FASTQs and `genome_tsv` for `hg38` and `mm10`. We recommend to make copies of these directories on your own project.

`genome_tsv`
- AWS: https://platform.dnanexus.com/projects/BKpvFg00VBPV975PgJ6Q03v6/data/pipeline-genome-data/genome_tsv/v3
- Azure: https://platform.dnanexus.com/projects/F6K911Q9xyfgJ36JFzv03Z5J/data/pipeline-genome-data/genome_tsv/v3

## Input JSON file specification
Example FASTQs
- AWS: https://platform.dnanexus.com/projects/BKpvFg00VBPV975PgJ6Q03v6/data/pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ/fastq_subsampled
- Azure: https://platform.dnanexus.com/projects/F6K911Q9xyfgJ36JFzv03Z5J/data/pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ/fastq_subsampled

> **IMPORTANT**: DO NOT BLINDLY USE A TEMPLATE/EXAMPLE INPUT JSON. READ THROUGH THE FOLLOWING GUIDE TO MAKE A CORRECT INPUT JSON FILE. ESPECIALLY FOR ADAPTERS.

An input JSON file specifies all the input parameters and files that are necessary for successfully running this pipeline. This includes a specification of the path to the genome reference files and the raw data fastq file. Please make sure to specify absolute paths rather than relative paths in your input JSON files.
## Running on DNAnexus (using our pre-built workflows)

1) [Input JSON file specification (short)](docs/input_short.md)
2) [Input JSON file specification (long)](docs/input.md)
See [this](docs/tutorial_dx_web.md) for details.


## How to organize outputs
Expand Down
22 changes: 11 additions & 11 deletions atac.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ struct RuntimeEnvironment {
}

workflow atac {
String pipeline_ver = 'v2.0.1'
String pipeline_ver = 'v2.0.2'

meta {
version: 'v2.0.1'
version: 'v2.0.2'

author: 'Jin wook Lee'
email: '[email protected]'
Expand All @@ -19,8 +19,8 @@ workflow atac {

specification_document: 'https://docs.google.com/document/d/1f0Cm4vRyDQDu0bMehHD7P7KOMxTOP-HiNoIvL1VcBt8/edit?usp=sharing'

default_docker: 'encodedcc/atac-seq-pipeline:v2.0.1'
default_singularity: 'library://leepc12/default/atac-seq-pipeline:v2.0.1'
default_docker: 'encodedcc/atac-seq-pipeline:v2.0.2'
default_singularity: 'library://leepc12/default/atac-seq-pipeline:v2.0.2'
default_conda: 'encode-atac-seq-pipeline'
croo_out_def: 'https://storage.googleapis.com/encode-pipeline-output-definition/atac.croo.v5.json'

Expand Down Expand Up @@ -72,8 +72,8 @@ workflow atac {
}
input {
# group: runtime_environment
String docker = 'encodedcc/atac-seq-pipeline:v2.0.1'
String singularity = 'library://leepc12/default/atac-seq-pipeline:v2.0.1'
String docker = 'encodedcc/atac-seq-pipeline:v2.0.2'
String singularity = 'library://leepc12/default/atac-seq-pipeline:v2.0.2'
String conda = 'encode-atac-seq-pipeline'
String conda_macs2 = 'encode-atac-seq-pipeline-macs2'
String conda_spp = 'encode-atac-seq-pipeline-spp'
Expand Down Expand Up @@ -1108,7 +1108,7 @@ workflow atac {
else select_first([paired_end])

Boolean has_input_of_align = i<length(fastqs_R1) && length(fastqs_R1[i])>0
Boolean has_output_of_align = i<length(bams) && defined(bams[i])
Boolean has_output_of_align = i<length(bams)
if ( has_input_of_align && !has_output_of_align ) {
call align { input :
fastqs_R1 = fastqs_R1[i],
Expand Down Expand Up @@ -1172,7 +1172,7 @@ workflow atac {
}
Boolean has_input_of_filter = has_output_of_align || defined(align.bam)
Boolean has_output_of_filter = i<length(nodup_bams) && defined(nodup_bams[i])
Boolean has_output_of_filter = i<length(nodup_bams)
# skip if we already have output of this step
if ( has_input_of_filter && !has_output_of_filter ) {
call filter { input :
Expand All @@ -1197,7 +1197,7 @@ workflow atac {
File? nodup_bam_ = if has_output_of_filter then nodup_bams[i] else filter.nodup_bam
Boolean has_input_of_bam2ta = has_output_of_filter || defined(filter.nodup_bam)
Boolean has_output_of_bam2ta = i<length(tas) && defined(tas[i])
Boolean has_output_of_bam2ta = i<length(tas)
if ( has_input_of_bam2ta && !has_output_of_bam2ta ) {
call bam2ta { input :
bam = nodup_bam_,
Expand Down Expand Up @@ -1392,10 +1392,10 @@ workflow atac {
}
# tasks factored out from ATAqC
Boolean has_input_of_tss_enrich = defined(nodup_bam_) && defined(tss_) && (
defined(align.read_len) || i<length(read_len) && defined(read_len[i]) )
defined(align.read_len) || i<length(read_len) )
if ( enable_tss_enrich && has_input_of_tss_enrich ) {
call tss_enrich { input :
read_len = if i<length(read_len) && defined(read_len[i]) then read_len[i]
read_len = if i<length(read_len) then read_len[i]
else align.read_len,
nodup_bam = nodup_bam_,
tss = tss_,
Expand Down

0 comments on commit cd1508c

Please sign in to comment.