Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tws cnv allow rd #8015

Merged
merged 1 commit into from
Sep 16, 2022
Merged

Tws cnv allow rd #8015

merged 1 commit into from
Sep 16, 2022

Conversation

tedsharpe
Copy link
Contributor

Tools that will allow us to run gCNV using DepthEvidence.

@codecov
Copy link

codecov bot commented Sep 7, 2022

Codecov Report

Merging #8015 (27ee495) into master (fee7b94) will increase coverage by 36.682%.
The diff coverage is 76.101%.

❗ Current head 27ee495 differs from pull request most recent head 76c7648. Consider uploading reports for the commit 76c7648 to get more accurate results

Additional details and impacted files
@@               Coverage Diff                @@
##              master     #8015        +/-   ##
================================================
+ Coverage     49.977%   86.659%   +36.682%     
- Complexity     28038     38881     +10843     
================================================
  Files           2331      2333         +2     
  Lines         182057    182340       +283     
  Branches       19984     20022        +38     
================================================
+ Hits           90987    158014     +67027     
+ Misses         85024     17300     -67724     
- Partials        6046      7026       +980     
Impacted Files Coverage Δ
...institute/hellbender/tools/sv/PrintReadCounts.java 66.667% <66.667%> (ø)
...hellbender/tools/walkers/sv/CollectSVEvidence.java 76.644% <67.647%> (-3.234%) ⬇️
...er/tools/walkers/sv/CollectSVEvidenceUnitTest.java 97.315% <96.875%> (+96.489%) ⬆️
...adinstitute/hellbender/tools/sv/DepthEvidence.java 71.795% <100.000%> (+1.525%) ⬆️
...nder/utils/codecs/copynumber/SimpleCountCodec.java 77.778% <100.000%> (+2.778%) ⬆️
...ender/tools/sv/PrintReadCountsIntegrationTest.java 100.000% <100.000%> (ø)
...s/walkers/sv/CollectSVEvidenceIntegrationTest.java 100.000% <100.000%> (ø)
...roadinstitute/hellbender/tools/LocalAssembler.java 67.425% <0.000%> (+0.073%) ⬆️
...rg/broadinstitute/hellbender/utils/io/IOUtils.java 74.728% <0.000%> (+0.272%) ⬆️
... and 731 more

Copy link
Contributor

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tedsharpe this will really help us streamline CNV calling in gatk-sv! The RD stats file can supplant our median coverage subworkflow as well. In the future we may want to expand the list of statistics for qc purposes. My comments are mostly cosmetic.

PS be sure to use an rd min mapping quality of 0 in the corresponding gatk-sv workflows.

oneLineSummary = "Prints count files for CNV determination.",
programGroup = StructuralVariantDiscoveryProgramGroup.class
)
@ExperimentalFeature
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@ExperimentalFeature
@ExperimentalFeature
@DocumentedFeature

programGroup = StructuralVariantDiscoveryProgramGroup.class
)
@ExperimentalFeature
public class PrintCNVCounts extends FeatureWalker<Feature> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I suggest naming the tool PrintReadCounts? Just to be consistent with the CollectReadCounts tool.

private GATKPath inputPath;

@Argument(
doc = "Output file(s) prefix",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
doc = "Output file(s) prefix",
doc = "Output file path prefix. Paths have the form \"{output-prefix}{sample-name}.counts.tsv\". Default is the current working directory."

public class PrintCNVCounts extends FeatureWalker<Feature> {
public static final String INPUT_ARGNAME = "input-counts";
public static final String OUTPUT_PREFIX_ARGNAME = "output-prefix";
public static final String OUTPUT_FILES_ARGNAME = "output-files";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static final String OUTPUT_FILES_ARGNAME = "output-files";
public static final String OUTPUT_FILES_ARGNAME = "output-file-list";

private String outputPrefix = "";

@Argument(
doc = "Output file containing a list of output files",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
doc = "Output file containing a list of output files",
doc = "Generates a list of the output file paths",

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you actually make this a tab-delimited table where the first column is sample ID and second column is the file path? This may make life easier down the road for pipelining.

optional = true)
public GATKPath depthEvidenceInputFilename;

@Argument(fullName = "depth-evidence-min-mapq",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define constant for this (and others below)

final static class CountCounter {
private final int[] lowCounts;
private final SortedMap<Integer, Integer> highCounts;
private int nCounts;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private int nCounts;
private long nCounts;

To be safe

try ( final BufferedWriter writer
= new BufferedWriter(new OutputStreamWriter(summaryPath.getOutputStream())) ) {
final int[] quartiles = countCounter.getQuartiles();
writer.write("rd_q25_" + sampleName + "\t" + quartiles[0]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
writer.write("rd_q25_" + sampleName + "\t" + quartiles[0]);
writer.write("rd_q25\t" + sampleName + "\t" + quartiles[0]);

This would make aggregating metrics across samples a little easier

Comment on lines 877 to 889
public int getMeanCount() {
return (int)Math.round((double)totalCounts/nCounts);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public int getMeanCount() {
return (int)Math.round((double)totalCounts/nCounts);
}
public double getMeanCount() {
return Math.round((double)totalCounts/nCounts);
}

I think most users would expect an exact mean, rounding to 2 decimal places would be fine though

* <li>ending position</li>
* <li>read count</li>
* </ul>
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*
*
* Note when only collecting RD evidence, users should provide the same interval list with -L as --depth-evidence-intervals in order to avoid processing unused reads outside the intervals.

@tedsharpe
Copy link
Contributor Author

All review comments addressed as suggested. Thanks for the suggestions. Ready for re-review.

Copy link
Contributor

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good from here.

@tedsharpe tedsharpe merged commit ef50c08 into master Sep 16, 2022
@tedsharpe tedsharpe deleted the tws_CNV_allow_RD branch September 16, 2022 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants