Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added and updated documentation #4003

Merged
merged 1 commit into from
Jan 9, 2018
Merged

Added and updated documentation #4003

merged 1 commit into from
Jan 9, 2018

Conversation

asmirnov239
Copy link
Collaborator

@asmirnov239 asmirnov239 commented Dec 20, 2017

The following tools are updated in this PR:
CountBases
CountBasesSpark
CountReads
CountReadsSpark
CheckPileup
EstimateLibraryComplexityGATK
FlagStat
FlagStatSpark
GetSampleName
SplitReads
AnalyzeCovariates

Could you please review @sooheelee? Thank you!

@asmirnov239 asmirnov239 force-pushed the as_doc_updates branch 2 times, most recently from 5be4409 to 7b9f7e0 Compare December 20, 2017 22:10
@codecov-io
Copy link

codecov-io commented Dec 20, 2017

Codecov Report

Merging #4003 into master will increase coverage by 0.345%.
The diff coverage is 73.333%.

@@               Coverage Diff               @@
##              master     #4003       +/-   ##
===============================================
+ Coverage     79.016%   79.361%   +0.345%     
- Complexity     16535     18635     +2100     
===============================================
  Files           1056      1184      +128     
  Lines          59150     67328     +8178     
  Branches        9615     10681     +1066     
===============================================
+ Hits           46738     53432     +6694     
- Misses          8673      9814     +1141     
- Partials        3739      4082      +343
Impacted Files Coverage Δ Complexity Δ
...rg/broadinstitute/hellbender/tools/CountReads.java 100% <ø> (ø) 3 <0> (ø) ⬇️
...rg/broadinstitute/hellbender/tools/CountBases.java 100% <ø> (ø) 3 <0> (ø) ⬇️
...lbender/tools/spark/pipelines/CountReadsSpark.java 90% <ø> (ø) 4 <0> (ø) ⬇️
...lbender/tools/spark/pipelines/CountBasesSpark.java 90% <ø> (ø) 5 <0> (ø) ⬇️
...ellbender/tools/spark/pipelines/FlagStatSpark.java 90% <ø> (ø) 4 <0> (ø) ⬇️
.../org/broadinstitute/hellbender/tools/FlagStat.java 76.404% <ø> (ø) 3 <0> (ø) ⬇️
.../markduplicates/EstimateLibraryComplexityGATK.java 0% <0%> (ø) 0 <0> (ø) ⬇️
...broadinstitute/hellbender/tools/GetSampleName.java 62.5% <100%> (ø) 6 <1> (ø) ⬇️
...rg/broadinstitute/hellbender/tools/SplitReads.java 90% <100%> (ø) 20 <0> (ø) ⬇️
...itute/hellbender/tools/walkers/qc/CheckPileup.java 64.151% <100%> (ø) 11 <0> (ø) ⬇️
... and 378 more

@sooheelee
Copy link
Contributor

I am just getting to this now. Thanks for your patience.

Copy link
Contributor

@sooheelee sooheelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some additional general comments:

  1. Unify 'Example', 'Usage Example' and 'Example Usage' --> 'Usage example' or 'Usage examples' depending on the number of example commands.
  2. Periods at the end of sentences within the javaDoc portion. You are correct to omit the period for the oneLineSummary.
  3. Tools need assignment to new program groups as follows:

CountBases --> CoverageAnalysisProgramGroup.java
CountBasesSpark --> CoverageAnalysisProgramGroup.java
CountReads --> CoverageAnalysisProgramGroup.java
CountReadsSpark --> CoverageAnalysisProgramGroup.java
CheckPileup --> DiagnosticsAndQCProgramGroup.java
EstimateLibraryComplexityGATK --> DiagnosticsAndQCProgramGroup.java
FlagStat --> DiagnosticsAndQCProgramGroup.java
FlagStatSpark --> DiagnosticsAndQCProgramGroup.java
GetSampleName --> DiagnosticsAndQCProgramGroup.java
SplitReads --> ReadDataProgramGroup.java
AnalyzeCovariates --> DiagnosticsAndQCProgramGroup.java

After these changes and after addressing the specific comments, please go ahead and merge. Happy holidays @asmirnov239.

@@ -28,7 +28,7 @@
* for more details on the format.
* </p>
*
* <h3>Input</h3>
* <h3>Inputs</h3>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add that the mpileup needs an index. Otherwise the tool errors:

An mpileup file generated by Samtools covering the interval of interest. It should be accompanied by an index, e.g. that generated with IndexFeatureFile.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"be grouped together for duplicate detection. In effect total_reads / 4^max_id_bases reads will " +
"be compared at a time, so lower numbers will produce more accurate results but consume " +
"exponentially more memory and CPU.")
"exponentially more memory and CPU."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @asmirnov239, just checking that you are kebabbing in a separated PR. For example, --MIN_IDENTICAL_BASES would become --min-identical-bases. I ask because I'd like to refer to parameters in the javaDoc portion and I want to make sure we refer to them correctly.

* Estimate library complexity from the sequence of read pairs
*
* <p>
* The estimation is done by sorting all reads by the first N bases (5 by default) of each read and then comparing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--> by the first N bases (defined by --min-identical-bases with default of 5)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* The estimation is done by sorting all reads by the first N bases (5 by default) of each read and then comparing
* reads with the first N bases identical to each other for duplicates. Reads are considered to be duplicates if
* they match each other with no gaps and an overall mismatch rate less than or equal to MAX_DIFF_RATE (0.03 by default).
* </p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to point out this library complexity estimation uses a different approach from MarkDuplicates. So add:

--> The approach differs from that taken by Picard MarkDuplicates to estimate library complexity in that here alignment is not a factor.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@DocumentedFeature
/**
* Spark tool to accumulate flag statistics given a BAM file, e.g. total number of reads with QC failure flag set, number of
* duplicates, percentage mapped etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add:
This tool can be run without explicitly specifying Spark options. That is to say, the given example command without Spark options will run locally. See <a href ="https://software.broadinstitute.org/gatk/documentation/article?id=10060">Tutorial#10060</a> for an example of how to set up and run a Spark tool on a cloud Spark cluster.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Should I also mention this for every other Spark tool?

Copy link
Contributor

@sooheelee sooheelee Jan 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the statement applies, then yes.

* (EXPERIMENTAL) Emit a single sample name from the bam header into an output file.
*
* <p>
* Note: If the bam has zero or more than one sample names in the header, this tool will error, by design.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

* -split-sample true /
* -split-read-group true /
* -split-library-name true
* </pre>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can omit the 3xtrues. Also, although the single dash is valid for these params, it looks jarring to see long names with the single dash. Let's use the two dashes (unless of course these no longer work?).

gatk SplitReads /
    -I input.bam /
    -O outputDirectory /
    --split-sample /
    --split-read-group /
    --split-library-name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. They work with just 2 dashes!

* -R reference.fasta \
* -bqsr recal1.table \
* -plots AnalyzeCovariates.pdf
* gatk AnalyzeCovariates \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these example commands can omit the -R option.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* -plots AnalyzeCovariates.pdf
* gatk AnalyzeCovariates \
* -R reference.fasta \
* -ignoreLMT \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove -ignoreLMT. The warning doesn't hurt as the run will continue anyways.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* -knownSites my-trusted-snps.vcf \
* -knownSites my-trusted-indels.vcf \
* -O recal1.table
* gatk BaseRecalibrator \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also remove the entirety of the Full BQSR quality assessment pipeline section. Such an overview belongs in a best practice tutorial and not in the tool doc. And actually, the tools have changed such that now some of these actually will error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sooheelee
Copy link
Contributor

Back to you @asmirnov239.

@asmirnov239
Copy link
Collaborator Author

@sooheelee I made requested changes, could you please look them over?
@lbergelson Could you review the kebabbing and test classes changes?
Thank you!

@asmirnov239 asmirnov239 removed their assignment Jan 2, 2018
Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asmirnov239 A few very minor comments. Feel free to merge after addressing those. Looks good.

@@ -167,43 +129,74 @@
static final String PDF_ARG_SHORT_NAME = "plots";
static final String BEFORE_ARG_SHORT_NAME = "before";
static final String AFTER_ARG_SHORT_NAME = "after";
public static final String IGNORE_LMT_SHORT_NAME = "ignoreLMT";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably just have no short name for this option if the short name is going to be something weird cased and long.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Argument(fullName = "pileup", shortName = "pileup", doc="Pileup generated by Samtools")
@Argument(
fullName = "pileup",
shortName = "pileup",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the short name and long name are the same you don't have to specify the short name

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Argument(fullName = "ignore_overlaps", shortName = "ignore_overlaps", doc = "Disable read-pair overlap detection", optional = true)
@Argument(
fullName = "ignore-overlaps",
shortName = "ignore-overlaps",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@sooheelee
Copy link
Contributor

Doc updates look good to me!

@droazen droazen self-assigned this Jan 6, 2018
@droazen
Copy link
Contributor

droazen commented Jan 6, 2018

Do not merge -- I'm taking this & the other argument-renaming branches over to modify them for consistency and get them merged before Monday.

@droazen droazen force-pushed the as_doc_updates branch 3 times, most recently from 8a2730b to 4e3d105 Compare January 9, 2018 00:36
… case for various tools

Addressed PR comments
@droazen
Copy link
Contributor

droazen commented Jan 9, 2018

I've rebased this branch, and added a few minor updates. This can be merged after test pass.

@droazen droazen merged commit 1cd37ce into master Jan 9, 2018
@droazen droazen deleted the as_doc_updates branch January 9, 2018 02:05
lucidtronix pushed a commit that referenced this pull request Jan 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants