-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added and updated documentation #4003
Conversation
5be4409
to
7b9f7e0
Compare
Codecov Report
@@ Coverage Diff @@
## master #4003 +/- ##
===============================================
+ Coverage 79.016% 79.361% +0.345%
- Complexity 16535 18635 +2100
===============================================
Files 1056 1184 +128
Lines 59150 67328 +8178
Branches 9615 10681 +1066
===============================================
+ Hits 46738 53432 +6694
- Misses 8673 9814 +1141
- Partials 3739 4082 +343
|
I am just getting to this now. Thanks for your patience. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some additional general comments:
- Unify 'Example', 'Usage Example' and 'Example Usage' --> 'Usage example' or 'Usage examples' depending on the number of example commands.
- Periods at the end of sentences within the javaDoc portion. You are correct to omit the period for the oneLineSummary.
- Tools need assignment to new program groups as follows:
CountBases --> CoverageAnalysisProgramGroup.java
CountBasesSpark --> CoverageAnalysisProgramGroup.java
CountReads --> CoverageAnalysisProgramGroup.java
CountReadsSpark --> CoverageAnalysisProgramGroup.java
CheckPileup --> DiagnosticsAndQCProgramGroup.java
EstimateLibraryComplexityGATK --> DiagnosticsAndQCProgramGroup.java
FlagStat --> DiagnosticsAndQCProgramGroup.java
FlagStatSpark --> DiagnosticsAndQCProgramGroup.java
GetSampleName --> DiagnosticsAndQCProgramGroup.java
SplitReads --> ReadDataProgramGroup.java
AnalyzeCovariates --> DiagnosticsAndQCProgramGroup.java
After these changes and after addressing the specific comments, please go ahead and merge. Happy holidays @asmirnov239.
@@ -28,7 +28,7 @@ | |||
* for more details on the format. | |||
* </p> | |||
* | |||
* <h3>Input</h3> | |||
* <h3>Inputs</h3> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add that the mpileup needs an index. Otherwise the tool errors:
An mpileup file generated by Samtools covering the interval of interest. It should be accompanied by an index, e.g. that generated with IndexFeatureFile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
"be grouped together for duplicate detection. In effect total_reads / 4^max_id_bases reads will " + | ||
"be compared at a time, so lower numbers will produce more accurate results but consume " + | ||
"exponentially more memory and CPU.") | ||
"exponentially more memory and CPU." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @asmirnov239, just checking that you are kebabbing in a separated PR. For example, --MIN_IDENTICAL_BASES
would become --min-identical-bases
. I ask because I'd like to refer to parameters in the javaDoc portion and I want to make sure we refer to them correctly.
* Estimate library complexity from the sequence of read pairs | ||
* | ||
* <p> | ||
* The estimation is done by sorting all reads by the first N bases (5 by default) of each read and then comparing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--> by the first N bases (defined by --min-identical-bases with default of 5)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* The estimation is done by sorting all reads by the first N bases (5 by default) of each read and then comparing | ||
* reads with the first N bases identical to each other for duplicates. Reads are considered to be duplicates if | ||
* they match each other with no gaps and an overall mismatch rate less than or equal to MAX_DIFF_RATE (0.03 by default). | ||
* </p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important to point out this library complexity estimation uses a different approach from MarkDuplicates. So add:
--> The approach differs from that taken by Picard MarkDuplicates to estimate library complexity in that here alignment is not a factor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@DocumentedFeature | ||
/** | ||
* Spark tool to accumulate flag statistics given a BAM file, e.g. total number of reads with QC failure flag set, number of | ||
* duplicates, percentage mapped etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add:
This tool can be run without explicitly specifying Spark options. That is to say, the given example command without Spark options will run locally. See <a href ="https://software.broadinstitute.org/gatk/documentation/article?id=10060">Tutorial#10060</a> for an example of how to set up and run a Spark tool on a cloud Spark cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Should I also mention this for every other Spark tool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the statement applies, then yes.
* (EXPERIMENTAL) Emit a single sample name from the bam header into an output file. | ||
* | ||
* <p> | ||
* Note: If the bam has zero or more than one sample names in the header, this tool will error, by design. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
* -split-sample true / | ||
* -split-read-group true / | ||
* -split-library-name true | ||
* </pre> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can omit the 3xtrue
s. Also, although the single dash is valid for these params, it looks jarring to see long names with the single dash. Let's use the two dashes (unless of course these no longer work?).
gatk SplitReads /
-I input.bam /
-O outputDirectory /
--split-sample /
--split-read-group /
--split-library-name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. They work with just 2 dashes!
* -R reference.fasta \ | ||
* -bqsr recal1.table \ | ||
* -plots AnalyzeCovariates.pdf | ||
* gatk AnalyzeCovariates \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these example commands can omit the -R
option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* -plots AnalyzeCovariates.pdf | ||
* gatk AnalyzeCovariates \ | ||
* -R reference.fasta \ | ||
* -ignoreLMT \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove -ignoreLMT
. The warning doesn't hurt as the run will continue anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* -knownSites my-trusted-snps.vcf \ | ||
* -knownSites my-trusted-indels.vcf \ | ||
* -O recal1.table | ||
* gatk BaseRecalibrator \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also remove the entirety of the Full BQSR quality assessment pipeline
section. Such an overview belongs in a best practice tutorial and not in the tool doc. And actually, the tools have changed such that now some of these actually will error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Back to you @asmirnov239. |
@sooheelee I made requested changes, could you please look them over? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asmirnov239 A few very minor comments. Feel free to merge after addressing those. Looks good.
@@ -167,43 +129,74 @@ | |||
static final String PDF_ARG_SHORT_NAME = "plots"; | |||
static final String BEFORE_ARG_SHORT_NAME = "before"; | |||
static final String AFTER_ARG_SHORT_NAME = "after"; | |||
public static final String IGNORE_LMT_SHORT_NAME = "ignoreLMT"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably just have no short name for this option if the short name is going to be something weird cased and long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@Argument(fullName = "pileup", shortName = "pileup", doc="Pileup generated by Samtools") | ||
@Argument( | ||
fullName = "pileup", | ||
shortName = "pileup", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the short name and long name are the same you don't have to specify the short name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@Argument(fullName = "ignore_overlaps", shortName = "ignore_overlaps", doc = "Disable read-pair overlap detection", optional = true) | ||
@Argument( | ||
fullName = "ignore-overlaps", | ||
shortName = "ignore-overlaps", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Doc updates look good to me! |
Do not merge -- I'm taking this & the other argument-renaming branches over to modify them for consistency and get them merged before Monday. |
8a2730b
to
4e3d105
Compare
… case for various tools Addressed PR comments
4e3d105
to
4c53f71
Compare
I've rebased this branch, and added a few minor updates. This can be merged after test pass. |
… case for various tools (#4003)
The following tools are updated in this PR:
CountBases
CountBasesSpark
CountReads
CountReadsSpark
CheckPileup
EstimateLibraryComplexityGATK
FlagStat
FlagStatSpark
GetSampleName
SplitReads
AnalyzeCovariates
Could you please review @sooheelee? Thank you!