Tws cnv allow rd #8015

tedsharpe · 2022-09-07T19:36:55Z

Tools that will allow us to run gCNV using DepthEvidence.

codecov · 2022-09-07T20:54:58Z

Codecov Report

Merging #8015 (27ee495) into master (fee7b94) will increase coverage by 36.682%.
The diff coverage is 76.101%.

❗ Current head 27ee495 differs from pull request most recent head 76c7648. Consider uploading reports for the commit 76c7648 to get more accurate results

Additional details and impacted files

@@               Coverage Diff                @@
##              master     #8015        +/-   ##
================================================
+ Coverage     49.977%   86.659%   +36.682%     
- Complexity     28038     38881     +10843     
================================================
  Files           2331      2333         +2     
  Lines         182057    182340       +283     
  Branches       19984     20022        +38     
================================================
+ Hits           90987    158014     +67027     
+ Misses         85024     17300     -67724     
- Partials        6046      7026       +980

Impacted Files	Coverage Δ
...institute/hellbender/tools/sv/PrintReadCounts.java	`66.667% <66.667%> (ø)`
...hellbender/tools/walkers/sv/CollectSVEvidence.java	`76.644% <67.647%> (-3.234%)`	⬇️
...er/tools/walkers/sv/CollectSVEvidenceUnitTest.java	`97.315% <96.875%> (+96.489%)`	⬆️
...adinstitute/hellbender/tools/sv/DepthEvidence.java	`71.795% <100.000%> (+1.525%)`	⬆️
...nder/utils/codecs/copynumber/SimpleCountCodec.java	`77.778% <100.000%> (+2.778%)`	⬆️
...ender/tools/sv/PrintReadCountsIntegrationTest.java	`100.000% <100.000%> (ø)`
...s/walkers/sv/CollectSVEvidenceIntegrationTest.java	`100.000% <100.000%> (ø)`
...roadinstitute/hellbender/tools/LocalAssembler.java	`67.425% <0.000%> (+0.073%)`	⬆️
...rg/broadinstitute/hellbender/utils/io/IOUtils.java	`74.728% <0.000%> (+0.272%)`	⬆️
... and 731 more

mwalker174

Thank you @tedsharpe this will really help us streamline CNV calling in gatk-sv! The RD stats file can supplant our median coverage subworkflow as well. In the future we may want to expand the list of statistics for qc purposes. My comments are mostly cosmetic.

PS be sure to use an rd min mapping quality of 0 in the corresponding gatk-sv workflows.

mwalker174 · 2022-09-14T15:57:45Z

src/main/java/org/broadinstitute/hellbender/tools/sv/PrintCNVCounts.java

+        oneLineSummary = "Prints count files for CNV determination.",
+        programGroup = StructuralVariantDiscoveryProgramGroup.class
+)
+@ExperimentalFeature


Suggested change

@ExperimentalFeature

@ExperimentalFeature

@DocumentedFeature

mwalker174 · 2022-09-14T16:49:27Z

src/main/java/org/broadinstitute/hellbender/tools/sv/PrintCNVCounts.java

+        programGroup = StructuralVariantDiscoveryProgramGroup.class
+)
+@ExperimentalFeature
+public class PrintCNVCounts extends FeatureWalker<Feature> {


Can I suggest naming the tool PrintReadCounts? Just to be consistent with the CollectReadCounts tool.

mwalker174 · 2022-09-14T16:53:32Z

src/main/java/org/broadinstitute/hellbender/tools/sv/PrintCNVCounts.java

+    private GATKPath inputPath;
+
+    @Argument(
+            doc = "Output file(s) prefix",


Suggested change

doc = "Output file(s) prefix",

doc = "Output file path prefix. Paths have the form \"{output-prefix}{sample-name}.counts.tsv\". Default is the current working directory."

mwalker174 · 2022-09-14T16:54:00Z

src/main/java/org/broadinstitute/hellbender/tools/sv/PrintCNVCounts.java

+public class PrintCNVCounts extends FeatureWalker<Feature> {
+    public static final String INPUT_ARGNAME = "input-counts";
+    public static final String OUTPUT_PREFIX_ARGNAME = "output-prefix";
+    public static final String OUTPUT_FILES_ARGNAME = "output-files";


Suggested change

public static final String OUTPUT_FILES_ARGNAME = "output-files";

public static final String OUTPUT_FILES_ARGNAME = "output-file-list";

mwalker174 · 2022-09-14T16:54:57Z

src/main/java/org/broadinstitute/hellbender/tools/sv/PrintCNVCounts.java

+    private String outputPrefix = "";
+
+    @Argument(
+            doc = "Output file containing a list of output files",


Suggested change

doc = "Output file containing a list of output files",

doc = "Generates a list of the output file paths",

Can you actually make this a tab-delimited table where the first column is sample ID and second column is the file path? This may make life easier down the road for pipelining.

mwalker174 · 2022-09-14T18:11:34Z