-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tws cnv allow rd #8015
Tws cnv allow rd #8015
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #8015 +/- ##
================================================
+ Coverage 49.977% 86.659% +36.682%
- Complexity 28038 38881 +10843
================================================
Files 2331 2333 +2
Lines 182057 182340 +283
Branches 19984 20022 +38
================================================
+ Hits 90987 158014 +67027
+ Misses 85024 17300 -67724
- Partials 6046 7026 +980
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @tedsharpe this will really help us streamline CNV calling in gatk-sv! The RD stats file can supplant our median coverage subworkflow as well. In the future we may want to expand the list of statistics for qc purposes. My comments are mostly cosmetic.
PS be sure to use an rd min mapping quality of 0 in the corresponding gatk-sv workflows.
oneLineSummary = "Prints count files for CNV determination.", | ||
programGroup = StructuralVariantDiscoveryProgramGroup.class | ||
) | ||
@ExperimentalFeature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ExperimentalFeature | |
@ExperimentalFeature | |
@DocumentedFeature |
programGroup = StructuralVariantDiscoveryProgramGroup.class | ||
) | ||
@ExperimentalFeature | ||
public class PrintCNVCounts extends FeatureWalker<Feature> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I suggest naming the tool PrintReadCounts
? Just to be consistent with the CollectReadCounts
tool.
private GATKPath inputPath; | ||
|
||
@Argument( | ||
doc = "Output file(s) prefix", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc = "Output file(s) prefix", | |
doc = "Output file path prefix. Paths have the form \"{output-prefix}{sample-name}.counts.tsv\". Default is the current working directory." |
public class PrintCNVCounts extends FeatureWalker<Feature> { | ||
public static final String INPUT_ARGNAME = "input-counts"; | ||
public static final String OUTPUT_PREFIX_ARGNAME = "output-prefix"; | ||
public static final String OUTPUT_FILES_ARGNAME = "output-files"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static final String OUTPUT_FILES_ARGNAME = "output-files"; | |
public static final String OUTPUT_FILES_ARGNAME = "output-file-list"; |
private String outputPrefix = ""; | ||
|
||
@Argument( | ||
doc = "Output file containing a list of output files", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc = "Output file containing a list of output files", | |
doc = "Generates a list of the output file paths", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you actually make this a tab-delimited table where the first column is sample ID and second column is the file path? This may make life easier down the road for pipelining.
optional = true) | ||
public GATKPath depthEvidenceInputFilename; | ||
|
||
@Argument(fullName = "depth-evidence-min-mapq", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define constant for this (and others below)
final static class CountCounter { | ||
private final int[] lowCounts; | ||
private final SortedMap<Integer, Integer> highCounts; | ||
private int nCounts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private int nCounts; | |
private long nCounts; |
To be safe
try ( final BufferedWriter writer | ||
= new BufferedWriter(new OutputStreamWriter(summaryPath.getOutputStream())) ) { | ||
final int[] quartiles = countCounter.getQuartiles(); | ||
writer.write("rd_q25_" + sampleName + "\t" + quartiles[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
writer.write("rd_q25_" + sampleName + "\t" + quartiles[0]); | |
writer.write("rd_q25\t" + sampleName + "\t" + quartiles[0]); |
This would make aggregating metrics across samples a little easier
public int getMeanCount() { | ||
return (int)Math.round((double)totalCounts/nCounts); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public int getMeanCount() { | |
return (int)Math.round((double)totalCounts/nCounts); | |
} | |
public double getMeanCount() { | |
return Math.round((double)totalCounts/nCounts); | |
} |
I think most users would expect an exact mean, rounding to 2 decimal places would be fine though
* <li>ending position</li> | ||
* <li>read count</li> | ||
* </ul> | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* | |
* | |
* Note when only collecting RD evidence, users should provide the same interval list with -L as --depth-evidence-intervals in order to avoid processing unused reads outside the intervals. |
All review comments addressed as suggested. Thanks for the suggestions. Ready for re-review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks good from here.
27ee495
to
76c7648
Compare
Tools that will allow us to run gCNV using DepthEvidence.