Update documentation for various tools (MW) #4026

mwalker174 · 2018-01-02T17:58:25Z

Documentation update for the following tools:

BwaMemIndexImageCreator
ClipReads
CompareDuplicatesSpark
ConvertHeaderlessHadoopBamShardToBam
CreateHadoopBamSplittingIndex

sooheelee · 2018-01-05T17:27:29Z

Just seeing this. I've only been looking at PRs where I am explicitly tagged within the comments (e.g. with @sooheelee). So I will take a quick look.

sooheelee

The content for ConvertHeaderlessHadoopBamShardToBam is particularly cryptic. I have general comments otherwise and changes to program groups.

sooheelee · 2018-01-05T20:55:11Z

src/main/java/org/broadinstitute/hellbender/tools/ConvertHeaderlessHadoopBamShardToBam.java

 import org.broadinstitute.barclay.argparser.Argument;
 import org.broadinstitute.barclay.argparser.BetaFeature;
 import org.broadinstitute.barclay.argparser.CommandLineProgramProperties;
 import org.broadinstitute.barclay.help.DocumentedFeature;
 import org.broadinstitute.hellbender.cmdline.CommandLineProgram;
-import org.broadinstitute.hellbender.cmdline.programgroups.TestSparkProgramGroup;
+import org.broadinstitute.hellbender.cmdline.StandardArgumentDefinitions;
+import org.broadinstitute.hellbender.cmdline.programgroups.ReadDataProgramGroup;


Actually, there is a consensus that this and a few other odd tools should now be in the OtherProgramGroup category. There is a new spreadsheet tab 1217Changes if you want to see the additional tweaks. Sorry for the late-breaking change.

sooheelee · 2018-01-05T20:56:22Z

src/main/java/org/broadinstitute/hellbender/tools/spark/CreateHadoopBamSplittingIndex.java

@@ -17,37 +10,55 @@
 import org.broadinstitute.barclay.help.DocumentedFeature;
 import org.broadinstitute.hellbender.cmdline.CommandLineProgram;
 import org.broadinstitute.hellbender.cmdline.StandardArgumentDefinitions;
-import org.broadinstitute.hellbender.cmdline.programgroups.SparkProgramGroup;
+import org.broadinstitute.hellbender.cmdline.programgroups.ReadDataProgramGroup;


This one too goes into the OtherProgramGroup.

sooheelee · 2018-01-05T20:57:09Z

src/main/java/org/broadinstitute/hellbender/tools/spark/validation/CompareDuplicatesSpark.java

@@ -13,7 +13,7 @@
 import org.broadinstitute.barclay.argparser.BetaFeature;
 import org.broadinstitute.barclay.argparser.CommandLineProgramProperties;
 import org.broadinstitute.barclay.help.DocumentedFeature;
-import org.broadinstitute.hellbender.cmdline.programgroups.TestSparkProgramGroup;
+import org.broadinstitute.hellbender.cmdline.programgroups.ReadDataProgramGroup;


This tool is now in the DiagnosticsAndQCProgramGroup.

sooheelee · 2018-01-05T21:49:07Z

src/main/java/org/broadinstitute/hellbender/tools/ConvertHeaderlessHadoopBamShardToBam.java

- * Spark troubleshooting utility that converts a headerless Hadoop bam shard (eg., a part0000, part0001, etc. file
- * produced by {@link ReadsSparkSink}) into a readable bam file by adding a header and a BGZF terminator.
+ * Convert a headerless Hadoop BAM shard into a readable BAM. This is a Spark troubleshooting utility that converts a
+ * headerless Hadoop BAM shard (eg., a part-r-00000, part-r-00001, etc. file


I'm afraid our users will not know what a shard is and this needs explanation alongside an explanation for the 'Hadoop' label.

Actually, many things about this tool are unclear, e.g. how a user may have created a headerless BAM shard to begin with. I'm not aware of a tool called ReadsSparkSink, so I assume it is some utility called by our Spark tools. Do we have an example tool that we can list specifically to help orient users? This example should be sufficient to clarify the use-case.

For example:

...file produced by a Spark tool, e.g. XXX, using ReadsSparkSink...

ReadsSparkSink is an implementation detail, so I think we can exclude it from the doc. Instead, I've added an explanation that a sharded BAM is produced by any Spark tool invoked with --sharded-output set to true. Also removed the reference to Hadoop since it is redundant - as far as I know, all shards are Hadoop in GATK.

As a side note, I noticed that this flag is still in camelCase (--shardedOutput). I'm assuming this will get converted to kebab.

sooheelee · 2018-01-05T22:13:23Z

src/main/java/org/broadinstitute/hellbender/tools/ConvertHeaderlessHadoopBamShardToBam.java

 * already readable using samtools. Currently {@link ReadsSparkSink} saves the "shards" with a header for the
- * {@link ReadsWriteFormat#SHARDED} case, and without a header for the {@link ReadsWriteFormat#SINGLE} case.
+ * {@link ReadsWriteFormat#SHARDED} case, and without a header for the {@link ReadsWriteFormat#SINGLE} case.</p>


Again, a rather cryptic description here. I leave it to your judgment.

Removed this because it is also an implementation detail.

sooheelee · 2018-01-05T22:18:26Z

src/main/java/org/broadinstitute/hellbender/tools/ConvertHeaderlessHadoopBamShardToBam.java

+ * gatk ConvertHeaderlessHadoopBamShardToBam \
+ *     --bam-shard shard.bam \
+ *     --bam-with-header input.bam \
+ *     --O output.bam


This example seems to imply you can use a file that is already in bam format (has header and terminator). I tested this command as you show and the command runs without error but produces an invalid output (that samtools refuses to display). Also --O-->-O.

Should this example access files that represent a Hadoop structure as mentioned above, e.g. with something like:

gatk ConvertHeaderlessHadoopBamShardToBam \ --bam-shard part-r-00001 \ --bam-with-header input.bam \ -O part-r-00001.bam

Finally, could the bam-with-header be instead a reference dictionary? I think yes as I tested this to create a similarly truncated file that my setup produces. So add another example command to the Usage examples:

Use a reference dictionary file as the header:

gatk ConvertHeaderlessHadoopBamShardToBam \ --bam-shard part-r-00001 \ --bam-with-header reference.dict \ -O part-r-00001.bam

Hopefully from a real shard part this produces a nontruncated full BAM.

Fixed the -O argument. I've changed the shard name to part-r-00000.bam to be consistent with the current shard naming convention:

gatk ConvertHeaderlessHadoopBamShardToBam \ --bam-shard part-r-00000.bam \ --bam-with-header input.bam \ -O output.bam

I'm hesitant to add a dictionary example to the doc because the code uses a SAM reader from htsjdk that expects a SAM/BAM file. It happens to work with a dictionary at the moment, but this is semantically different from the intended use of the reader.

sooheelee · 2018-01-05T22:31:25Z

src/main/java/org/broadinstitute/hellbender/tools/spark/CreateHadoopBamSplittingIndex.java

+
+/**
+ * Create a Hadoop BAM splitting index and optionally a BAM index from a BAM file.
+ *


I'd like to know more about the significance of the 4096 splitting-index-granularity default value. For another time perhaps.

sooheelee · 2018-01-05T22:39:13Z

src/main/java/org/broadinstitute/hellbender/tools/spark/validation/CompareDuplicatesSpark.java

+ * gatk CompareDuplicatesSpark \
+ *     -I input_reads_1.bam \
+ *     -I2 input_reads_2.bam
+ * </pre>


Given I can run this command without specifying any spark args, add the blurb:

This tool can be run without explicitly specifying Spark options. That is to say, the given example command without Spark options will run locally. See <a href ="https://software.broadinstitute.org/gatk/documentation/article?id=10060">Tutorial#10060</a> for an example of how to set up and run a Spark tool on a cloud Spark cluster.

If possible, a Spark-enabled command would be nice to add. But the above blurb should be sufficient for now.

Added the blurb.

sooheelee · 2018-01-05T22:41:36Z

Back to you @mwalker174

droazen · 2018-01-06T19:17:00Z

Do not merge -- I'm taking this & the other argument-renaming branches over to modify them for consistency and get them merged before Monday.

sooheelee · 2018-01-08T00:25:22Z

I don't believe @mwalker174 has incorporated feedback as of yet.

droazen · 2018-01-08T00:30:13Z

I'll address the comments myself when I rebase the branch later tonight.

TedBrookings

Again, just reviewing java correctness. I see no problems with code logic, and the changes look good to me.

mwalker174 · 2018-01-08T22:36:05Z

@sooheelee Thank you for your review. I've addressed your suggestions.

@droazen The branch is now also rebased on the latest master.

codecov-io · 2018-01-08T23:14:17Z

Codecov Report

Merging #4026 into master will increase coverage by 0.111%.
The diff coverage is 100%.

@@               Coverage Diff               @@
##              master     #4026       +/-   ##
===============================================
+ Coverage     79.016%   79.127%   +0.111%     
- Complexity     16535     16785      +250     
===============================================
  Files           1056      1059        +3     
  Lines          59150     59733      +583     
  Branches        9615      9758      +143     
===============================================
+ Hits           46738     47265      +527     
- Misses          8673      8716       +43     
- Partials        3739      3752       +13

Impacted Files	Coverage Δ	Complexity Δ
...tute/hellbender/tools/BwaMemIndexImageCreator.java	`71.429% <ø> (ø)`	`2 <0> (ø)`	⬇️
...der/tools/spark/CreateHadoopBamSplittingIndex.java	`88.462% <ø> (ø)`	`14 <0> (ø)`	⬇️
...org/broadinstitute/hellbender/tools/ClipReads.java	`90.385% <100%> (ø)`	`35 <0> (ø)`	⬇️
...tools/spark/validation/CompareDuplicatesSpark.java	`82.927% <100%> (ø)`	`24 <0> (ø)`	⬇️
...er/tools/ConvertHeaderlessHadoopBamShardToBam.java	`76.923% <100%> (ø)`	`2 <0> (ø)`	⬇️
...institute/hellbender/utils/help/HelpConstants.java	`2.381% <0%> (-1.786%)`	`1% <0%> (ø)`
...itute/hellbender/tools/walkers/bqsr/ApplyBQSR.java	`90.476% <0%> (-1.19%)`	`10% <0%> (+4%)`
.../hellbender/tools/spark/BaseRecalibratorSpark.java	`86.047% <0%> (-0.91%)`	`15% <0%> (+6%)`
...hellbender/utils/test/VariantContextTestUtils.java	`78.972% <0%> (-0.758%)`	`53% <0%> (-4%)`
...s/spark/ParallelCopyGCSDirectoryIntoHDFSSpark.java	`75.258% <0%> (-0.252%)`	`17% <0%> (ø)`
... and 30 more

sooheelee · 2018-01-08T23:31:40Z

👍🏽
Please check with @droazen before merging.

droazen

Rebased one last time and made some minor edits. Good to merge once tests pass 👍

mwalker174 requested a review from sooheelee January 2, 2018 17:58

sooheelee suggested changes Jan 5, 2018

View reviewed changes

droazen self-assigned this Jan 6, 2018

sooheelee mentioned this pull request Jan 8, 2018

All hands on deck: tool doc updates #3853

Closed

TedBrookings approved these changes Jan 8, 2018

View reviewed changes

mwalker174 force-pushed the mw_other_docs branch from e5bb6ad to 3e63d86 Compare January 8, 2018 22:27

sooheelee approved these changes Jan 8, 2018

View reviewed changes

Update documentation for various tools (MW)

22e2da2

droazen force-pushed the mw_other_docs branch from 3e63d86 to 22e2da2 Compare January 9, 2018 01:05

droazen approved these changes Jan 9, 2018

View reviewed changes

droazen merged commit 058f4e2 into master Jan 9, 2018

droazen deleted the mw_other_docs branch January 9, 2018 02:06

lucidtronix pushed a commit that referenced this pull request Jan 12, 2018

Update documentation for various tools (MW) (#4026)

57a27cd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update documentation for various tools (MW) #4026

Update documentation for various tools (MW) #4026

mwalker174 commented Jan 2, 2018

sooheelee commented Jan 5, 2018

sooheelee left a comment

sooheelee Jan 5, 2018

mwalker174 Jan 8, 2018

sooheelee Jan 5, 2018

mwalker174 Jan 8, 2018

sooheelee Jan 5, 2018

mwalker174 Jan 8, 2018

sooheelee Jan 5, 2018

mwalker174 Jan 8, 2018

sooheelee Jan 5, 2018

mwalker174 Jan 8, 2018

sooheelee Jan 5, 2018

mwalker174 Jan 8, 2018 •

edited

Loading

sooheelee Jan 5, 2018

sooheelee Jan 5, 2018

mwalker174 Jan 8, 2018

sooheelee commented Jan 5, 2018

droazen commented Jan 6, 2018

sooheelee commented Jan 8, 2018

droazen commented Jan 8, 2018

TedBrookings left a comment

mwalker174 commented Jan 8, 2018

codecov-io commented Jan 8, 2018 •

edited

Loading

sooheelee commented Jan 8, 2018

droazen left a comment

Update documentation for various tools (MW) #4026

Update documentation for various tools (MW) #4026

Conversation

mwalker174 commented Jan 2, 2018

sooheelee commented Jan 5, 2018

sooheelee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mwalker174 Jan 8, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sooheelee commented Jan 5, 2018

droazen commented Jan 6, 2018

sooheelee commented Jan 8, 2018

droazen commented Jan 8, 2018

TedBrookings left a comment

Choose a reason for hiding this comment

mwalker174 commented Jan 8, 2018

codecov-io commented Jan 8, 2018 • edited Loading

Codecov Report

sooheelee commented Jan 8, 2018

droazen left a comment

Choose a reason for hiding this comment

mwalker174 Jan 8, 2018 •

edited

Loading

codecov-io commented Jan 8, 2018 •

edited

Loading