-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Scramble accuracy for BWA and Dragen 3.7.8 #722
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -40,7 +40,6 @@ A structural variation discovery pipeline for Illumina short-read whole-genome s | |
* A workflow execution system supporting the [Workflow Description Language](https://openwdl.org/) (WDL), either: | ||
* [Cromwell](https://github.com/broadinstitute/cromwell) (v36 or higher). A dedicated server is highly recommended. | ||
* or [Terra](https://terra.bio/) (note preconfigured GATK-SV workflows are not yet available for this platform) | ||
* Recommended: [MELT](https://melt.igs.umaryland.edu/). Due to licensing restrictions, we cannot provide a public docker image or reference panel VCFs for this algorithm. | ||
* Recommended: [cromshell](https://github.com/broadinstitute/cromshell) for interacting with a dedicated Cromwell server. | ||
* Recommended: [WOMtool](https://cromwell.readthedocs.io/en/stable/WOMtool/) for validating WDL/json files. | ||
|
||
|
@@ -122,16 +121,18 @@ There are two scripts for running the full pipeline: | |
|
||
#### Building inputs | ||
Example workflow inputs can be found in `/inputs`. Build using `scripts/inputs/build_default_inputs.sh`, which | ||
generates input jsons in `/inputs/build`. Except the MELT docker image, all required resources are available in public | ||
generates input jsons in `/inputs/build`. All required resources are available in public | ||
Google buckets. | ||
|
||
#### MELT | ||
**Important**: The example input files contain MELT inputs that are NOT public (see [Requirements](#requirements)). These include: | ||
**Important**: MELT has been replaced with [Scramble](https://github.com/GeneDx/scramble) for mobile element calling. While it is still possible to run GATK-SV with MELT, we no longer support it as a caller. It will be fully deprecated in the future. | ||
|
||
Due to licensing restrictions, we cannot redistribute MELT binaries or input files, including the docker image. Some default input files contain MELT inputs that are NOT public (see [Requirements](#requirements)) including: | ||
|
||
* `GATKSVPipelineSingleSample.melt_docker` and `GATKSVPipelineBatch.melt_docker` - MELT docker URI (see [Docker readme](https://github.com/talkowski-lab/gatk-sv-v1/blob/master/dockerfiles/README.md)) | ||
* `GATKSVPipelineSingleSample.ref_std_melt_vcfs` - Standardized MELT VCFs ([GatherBatchEvidence](#gather-batch-evidence)) | ||
|
||
The input values are provided only as an example and are not publicly accessible. In order to include MELT, these values must be provided by the user. MELT can be disabled by deleting these inputs and setting `GATKSVPipelineBatch.use_melt` to `false`. | ||
The input values are provided only as placeholders. In some workflows, MELT must be enabled with appropriate settings, by providing optional MELT inputs and/or with an explicit option e.g. `GATKSVPipelineBatch.use_melt` to `true`. We do not recommend running both Scramble and MELT together. | ||
|
||
#### Execution | ||
We recommend running the pipeline on a dedicated [Cromwell](https://github.com/broadinstitute/cromwell) server with a [cromshell](https://github.com/broadinstitute/cromshell) client. A batch run can be started with the following commands: | ||
|
@@ -151,7 +152,7 @@ where `cromwell_config.json` is a Cromwell [workflow options file](https://cromw | |
|
||
## <a name="overview">Pipeline Overview</a> | ||
The pipeline consists of a series of modules that perform the following: | ||
* [GatherSampleEvidence](#gather-sample-evidence): SV evidence collection, including calls from a configurable set of algorithms (Manta, MELT, and Wham), read depth (RD), split read positions (SR), and discordant pair positions (PE). | ||
* [GatherSampleEvidence](#gather-sample-evidence): SV evidence collection, including calls from a configurable set of algorithms (Manta, Scramble, and Wham), read depth (RD), split read positions (SR), and discordant pair positions (PE). | ||
* [EvidenceQC](#evidence-qc): Dosage bias scoring and ploidy estimation | ||
* [GatherBatchEvidence](#gather-batch-evidence): Copy number variant calling using cn.MOPS and GATK gCNV; B-allele frequency (BAF) generation; call and evidence aggregation | ||
* [ClusterBatch](#cluster-batch): Variant clustering | ||
|
@@ -249,18 +250,21 @@ The following sections briefly describe each module and highlights inter-depende | |
## <a name="gather-sample-evidence">GatherSampleEvidence</a> | ||
*Formerly Module00a* | ||
|
||
Runs raw evidence collection on each sample with the following SV callers: [Manta](https://github.com/Illumina/manta), [Wham](https://github.com/zeeev/wham), and/or [MELT](https://melt.igs.umaryland.edu/). For guidance on pre-filtering prior to `GatherSampleEvidence`, refer to the [Sample Exclusion](#sample-exclusion) section. | ||
Runs raw evidence collection on each sample with the following SV callers: [Manta](https://github.com/Illumina/manta), [Wham](https://github.com/zeeev/wham), [Scramble](https://github.com/GeneDx/scramble), and/or [MELT](https://melt.igs.umaryland.edu/). For guidance on pre-filtering prior to `GatherSampleEvidence`, refer to the [Sample Exclusion](#sample-exclusion) section. | ||
|
||
The `scramble_clusters` and `scramble_table` are generated as outputs for troubleshooting purposes but not consumed by any downstream workflows. | ||
|
||
Note: a list of sample IDs must be provided. Refer to the [sample ID requirements](#sampleids) for specifications of allowable sample IDs. IDs that do not meet these requirements may cause errors. | ||
|
||
#### Inputs: | ||
* Per-sample BAM or CRAM files aligned to hg38. Index files (`.bai`) must be provided if using BAMs. | ||
|
||
#### Outputs: | ||
* Caller VCFs (Manta, MELT, and/or Wham) | ||
* Caller VCFs (Manta, Scramble, MELT, and/or Wham) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Again the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here |
||
* Binned read counts file | ||
* Split reads (SR) file | ||
* Discordant read pairs (PE) file | ||
* Scramble intermediate clusters file and table (not needed downstream) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "not needed downstream but useful for examining candidate sites when high sensitivity is required"? Or something similar describing the main use of having the file as an output? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And here |
||
|
||
## <a name="evidence-qc">EvidenceQC</a> | ||
*Formerly Module00b* | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,6 +4,7 @@ ARG UBUNTU_RELEASE="22.04" | |
ARG HTSLIB_VERSION="1.15.1" | ||
ARG BEDTOOLS_VERSION="2.31.0" | ||
ARG VCFTOOLS_VERSION="0.1.16" | ||
ARG BWA_COMMIT="139f68fc4c3747813783a488aef2adc86626b01b" | ||
|
||
ARG APT_REQUIRED_PACKAGES="/opt/apt-required-packages.list" | ||
|
||
|
@@ -14,7 +15,7 @@ ARG DEBIAN_FRONTEND=noninteractive | |
RUN apt-get -qqy update --fix-missing && \ | ||
apt-get -qqy dist-upgrade && \ | ||
apt-get -qqy install --no-install-recommends \ | ||
ca-certificates autoconf automake bzip2 g++ make wget pkgconf python2 \ | ||
ca-certificates autoconf automake bzip2 g++ git make wget pkgconf python2 \ | ||
libssl-dev libbz2-dev libcurl4-openssl-dev liblzma-dev libncurses-dev zlib1g-dev libdeflate-dev | ||
|
||
# install samtools | ||
|
@@ -51,6 +52,19 @@ RUN wget -q https://github.com/arq5x/bedtools2/releases/download/v$BEDTOOLS_VERS | |
mv bedtools.static /opt/bedtools/bin/bedtools && \ | ||
chmod a+x /opt/bedtools/bin/bedtools | ||
|
||
# install bwa | ||
# must do from source because of compiler error in latest release (see https://github.com/lh3/bwa/issues/387) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I looked up this issue and it looks like it might have been fixed in a new release 0.7.18 in the last few weeks (https://github.com/lh3/bwa/releases/tag/v0.7.18). Might be worth switching this back to installing a release rather than building from source for simplicity and build time sake? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It doesn't look like a build got included with that release unfortunately |
||
ARG BWA_COMMIT | ||
RUN cd /opt && \ | ||
git clone https://github.com/lh3/bwa.git && \ | ||
cd bwa && \ | ||
git checkout $BWA_COMMIT && \ | ||
make -s && \ | ||
cd .. && \ | ||
mkdir -p /opt/bin && \ | ||
mv /opt/bwa/bwa /opt/bin/ && \ | ||
rm -r bwa | ||
ENV PATH=/opt/bin:$PATH | ||
|
||
############### stage 1: copy tools and install needed non-dev libraries | ||
FROM ubuntu:$UBUNTU_RELEASE | ||
|
@@ -100,3 +114,4 @@ RUN tabix --version | |
RUN bcftools --version | ||
RUN bedtools --version | ||
RUN vcftools --version | ||
RUN which bwa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you say not to above but this list makes it sound like Scramble + MELT would be OK.
Maybe rephrase to,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed this too and am fixing it in a separate PR transferring this over to the website