Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docpatch #5047

Merged
merged 4 commits into from
Aug 28, 2018
Merged

Docpatch #5047

merged 4 commits into from
Aug 28, 2018

Conversation

jakevc
Copy link
Contributor

@jakevc jakevc commented Jul 23, 2018

Suggesting --TMP-DIR in documentation of GenomicsDBImport and GenotypeGVCFs

@codecov-io
Copy link

codecov-io commented Jul 23, 2018

Codecov Report

Merging #5047 into master will increase coverage by 0.254%.
The diff coverage is n/a.

@@               Coverage Diff               @@
##              master     #5047       +/-   ##
===============================================
+ Coverage     86.379%   86.633%   +0.254%     
- Complexity     28681     29218      +537     
===============================================
  Files           1785      1810       +25     
  Lines         132742    135251     +2509     
  Branches       14773     15000      +227     
===============================================
+ Hits          114661    117172     +2511     
+ Misses         12747     12680       -67     
- Partials        5334      5399       +65
Impacted Files Coverage Δ Complexity Δ
.../hellbender/tools/genomicsdb/GenomicsDBImport.java 79.098% <ø> (-0.985%) 52 <0> (ø)
...titute/hellbender/tools/walkers/GenotypeGVCFs.java 89.916% <ø> (ø) 49 <0> (ø) ⬇️
...ender/tools/spark/sv/utils/GATKSVVCFConstants.java 0% <0%> (-75%) 0% <0%> (-1%)
...s/spark/ParallelCopyGCSDirectoryIntoHDFSSpark.java 0% <0%> (-74.257%) 0% <0%> (-17%)
...nder/tools/spark/pipelines/PrintVariantsSpark.java 0% <0%> (-66.667%) 0% <0%> (-2%)
...der/tools/spark/sv/discovery/SvDiscoveryUtils.java 2.564% <0%> (-34.749%) 1% <0%> (-7%)
...utils/smithwaterman/SmithWatermanIntelAligner.java 50% <0%> (-30%) 1% <0%> (-2%)
...oadinstitute/hellbender/utils/gcs/BucketUtils.java 56.774% <0%> (-23.226%) 31% <0%> (-9%)
...nder/tools/spark/BaseRecalibratorSparkSharded.java 0% <0%> (-22.807%) 0% <0%> (-2%)
...bender/tools/walkers/annotator/ReferenceBases.java 83.333% <0%> (-16.667%) 6% <0%> (+1%)
... and 263 more

@lbergelson lbergelson self-requested a review August 13, 2018 15:31
@lbergelson lbergelson self-assigned this Aug 13, 2018
@droazen droazen assigned cmnbroad and unassigned lbergelson Aug 27, 2018
@droazen droazen requested review from cmnbroad and removed request for lbergelson August 27, 2018 19:15
Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor changes requested.

@@ -86,6 +86,7 @@
* -V data/gvcfs/father.g.vcf.gz \
* -V data/gvcfs/son.g.vcf.gz \
* --genomicsdb-workspace-path my_database \
* --TMP-DIR=path/to/other/tmp \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We recently changed the name of this argument totmp-dir, so the doc here and below should be changed to reflect the new name. Also, I'd suggest using an example path that includes the word large, to reinforce the reason for specifying this:

--tmp-dir=path/to/large/tmp \

@@ -117,6 +119,7 @@
* <li>At least one interval must be provided</li>
* <li>Input GVCFs cannot contain multiple entries for a single genomic position</li>
* <li>The --genomicsdb-workspace-path must point to a non-existent or empty directory.</li>
* <li>GenomicsDBImport makes greedy use of /tmp by default, if your /tmp space is limited this will cause errors once /tmp has filled up. This happens quickly if the tool is running many intervals. It is therefore recommended to specify a different `--TMP-DIR`.</li>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested wording for this:

GenomicsDBImport uses temporary disk storage during import. The amount of temporary storage required can exceed the space available, especially when using a large number of intervals. The command line argument --temp-dir can be used to specify an alternate temporary storage location with sufficient space

@@ -70,6 +71,7 @@
* programs produce files that they call GVCFs but those lack some important information (accurate genotype likelihoods
* for every position) that GenotypeGVCFs requires for its operation.</li>
* <li>Cannot take multiple GVCF files in one command.</li>
* <li>Reading from a GenomicsDB workspace can fill up /tmp by default, causing confusing errors when scattering across many intervals. It is recommended to specify a `--TMP-DIR` if running this tool in combination with GenomicsDBImport.</li>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest similar wording here as above.

@@ -61,7 +61,8 @@
* gatk --java-options "-Xmx4g" GenotypeGVCFs \
* -R Homo_sapiens_assembly38.fasta \
* -V gendb://my_database \
* -O output.vcf.gz
* -O output.vcf.gz \
* --TMP-DIR=path/to/other/tmp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakevc Thanks for the contribution!

@cmnbroad cmnbroad merged commit 23f6f08 into broadinstitute:master Aug 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants