fix 4649 #4677

SHuang-Broad · 2018-04-19T10:07:57Z

The cause of the exception is a new edge case that was not imagined when the reference region segmenting logic was initially written.
It is now covered, with updated tests.

codecov-io · 2018-04-20T13:16:14Z

Codecov Report

Merging #4677 into master will increase coverage by 0.03%.
The diff coverage is 90.476%.

@@              Coverage Diff               @@
##              master     #4677      +/-   ##
==============================================
+ Coverage     79.894%   79.924%   +0.03%     
+ Complexity     17388     17348      -40     
==============================================
  Files           1080      1074       -6     
  Lines          63091     62942     -149     
  Branches       10180     10186       +6     
==============================================
- Hits           50406     50306     -100     
+ Misses          8701      8652      -49     
  Partials        3984      3984

Impacted Files	Coverage Δ	Complexity Δ
...ry/inference/CpxVariantInducingAssemblyContig.java	`84.141% <100%> (+0.963%)`	`27 <3> (+3)`	⬆️
...y/inference/CpxVariantCanonicalRepresentation.java	`78.992% <71.429%> (+0.022%)`	`52 <0> (+2)`	⬆️
...transforms/markduplicates/MarkDuplicatesSpark.java	`90.909% <0%> (-4.213%)`	`9% <0%> (-6%)`
...hellbender/tools/walkers/mutect/Mutect2Engine.java	`87.654% <0%> (-3.39%)`	`50% <0%> (+1%)`
...tools/spark/validation/CompareDuplicatesSpark.java	`82.927% <0%> (-1.518%)`	`24% <0%> (ø)`
...forms/markduplicates/MarkDuplicatesSparkUtils.java	`89.5% <0%> (-1.083%)`	`58% <0%> (-9%)`
...ellbender/tools/walkers/vqsr/CNNScoreVariants.java	`74.057% <0%> (-0.829%)`	`40% <0%> (-1%)`
...r/engine/filters/ReadGroupBlackListReadFilter.java	`83.333% <0%> (-0.303%)`	`17% <0%> (ø)`
...lkers/ReferenceConfidenceVariantContextMerger.java	`94.979% <0%> (-0.278%)`	`69% <0%> (-2%)`
...stitute/hellbender/tools/walkers/CombineGVCFs.java	`93.75% <0%> (-0.25%)`	`60% <0%> (-2%)`
... and 35 more

TedBrookings · 2018-04-25T17:50:15Z

...stitute/hellbender/tools/spark/sv/discovery/inference/CpxVariantCanonicalRepresentation.java

+                final SimpleInterval twoBaseSegment = new SimpleInterval(eventPrimaryChromosome, leftBoundary.getEnd(), rightBoundary.getStart());
+                if ( twoBaseBoundaries.contains(twoBaseSegment) ) {
+                    segments.add(twoBaseSegment);
+                }


Assuming I'm correct about the above being a typo, you might decrease scope for this kind of error be refactoring thusly:

final int start = rightBoundary.getStart(); final int end = leftBoundary.getEnd(); final int segmentLength = end - start; if( segmentLength >= 1) { final SimpleInterval newSegment = new SimpleInterval(eventPrimaryChromosome, start, end); if(segmentLength > 1 || twoBaseBoundaries.contains(newSegment)) { segments.add(newSegment); } }

I see what you mean, but like I replied in the other place, it's a "point" location so won't matter in this case.
And the suggestion you made is good, except that end is always >= start because it is from the "left" (I know, so many mind-numbing details...)

TedBrookings · 2018-04-25T17:58:43Z

...stitute/hellbender/tools/spark/sv/discovery/inference/CpxVariantCanonicalRepresentation.java

@@ -168,8 +171,14 @@ static SimpleInterval getAffectedReferenceRegion(final List<SimpleInterval> even
            // there shouldn't be a segment constructed if two segmenting locations are adjacent to each other on the reference
            // this could happen when (in the simplest case), two alignments are separated by a mapped insertion (hence 3 total alignments),
            // and the two alignments' ref span are connected
+            // more generally: only segment when the two segmenting locations are boundaries of alignments that overlap each other (ref. span)
            if (rightBoundary.getStart() - leftBoundary.getEnd() > 1) {
                segments.add(new SimpleInterval(eventPrimaryChromosome, leftBoundary.getStart(), rightBoundary.getStart()));


Should the new SimpleInterval use leftBoundary.getEnd() ?

It actually doesn't matter, because the boundaries are 1bp "point" locations.
I'll make it clear in the docs.

also changed to getEnd()

TedBrookings · 2018-04-25T19:06:58Z

...nstitute/hellbender/tools/spark/sv/discovery/inference/CpxVariantInducingAssemblyContig.java

+
+        final String chr = basicInfo.eventPrimaryChromosome;
+
+        final Set<SimpleInterval> result = new HashSet<>(contigWithFineTunedAlignments.getAlignments().size());


Should be 2 * contigWithFineTunedAlignments.getAlignments().size(), since you get up to two SimpleIntervals per alignment.

yep and done as suggested

TedBrookings

I found one typo that must be fixed, however it is quite simple to fix so I'll just approve now.

The test code is nigh impenetrable (to me). Partly that's because I'm just looking at this part of the pipeline for the first time, partly it's because you're addressing an inherently complex problem, and partly it's all those "manually curated values". I have no idea if it's feasible or totally crazy, but it might be good to have an adversarial function that generates problems from known solutions rather than doing it by hand.

* better exception message and fix bug in existing cpx variant test data (SV) new cpx variant scenario commit: * optionally add reference segments when the would be segment is 2-bp long (before all such segments are skipped)

SHuang-Broad · 2018-04-26T20:21:01Z

It's always difficult to come up with test data for these complex events.

What I usually do is to create test data on what I know should work, including imaginable edge cases, and accumulate more edge cases as we run into them.

Can you expand a little on what you mean by

to have an adversarial function that generates problems from known solutions rather than doing it by hand

TedBrookings · 2018-04-26T20:53:09Z

Like I said, I don't know enough to say if this approach is feasible, but sometimes it's possible to test an algorithm with a problem-generating function. This works well when it's easy to generate self-consistent solutions and test data consistent with a given solution. I was able to do this for some of the interval overlap functions on the python side of my code. Other methods were too complicated (for me to figure out anyway).

Let's say you have an algorithm function f_algo that you want to test. You design a function test_data_factory that takes as input either a known solution, or a set of parameters from which it can easily generate a known solution (by a route that is logically distinct from the way f_algo works). Then f_tester generates and returns test_data. The flow looks like

// get either random test solutions or some distinct set that tests particular edge cases
test_params = get_test_params();
test_solution = easy_solution_from_params(test_params);

test_data = test_data_factory(test_params, test_solution);
assert f_algo(test_data) == test_solution;

In some sense you've already done that, but you've personally stepped in as test_data_factory(), and I'm wondering if it would be possible to automate that to get rid of all those numbers in the test code.

SHuang-Broad · 2018-04-26T21:04:24Z

Hmm, I see what you mean.
It’s an interesting idea, and I think Valentin does that for some CIGAR related tests; but I think it is difficult to apply in our case without having a “simulator”. I’ll keep this in mind.

* better exception message and fix bug in existing cpx variant test data (SV) new cpx variant scenario commit: * optionally add reference segments when the would be segment is 2-bp long (before all such segments are skipped)

SHuang-Broad added the SV label Apr 19, 2018

SHuang-Broad force-pushed the sh-sv-issues-4649 branch from 92a2262 to ae67721 Compare April 19, 2018 22:30

SHuang-Broad force-pushed the sh-sv-issues-4649 branch from ae67721 to 792b81b Compare April 22, 2018 19:17

TedBrookings self-assigned this Apr 25, 2018

TedBrookings reviewed Apr 25, 2018

View reviewed changes

TedBrookings approved these changes Apr 25, 2018

View reviewed changes

(SV) bug-fixing commit:

015ed5a

* better exception message and fix bug in existing cpx variant test data (SV) new cpx variant scenario commit: * optionally add reference segments when the would be segment is 2-bp long (before all such segments are skipped)

SHuang-Broad force-pushed the sh-sv-issues-4649 branch from 792b81b to 015ed5a Compare April 26, 2018 20:04

SHuang-Broad merged commit 9f6838b into master Apr 26, 2018

SHuang-Broad deleted the sh-sv-issues-4649 branch April 26, 2018 22:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix 4649 #4677

fix 4649 #4677

SHuang-Broad commented Apr 19, 2018

codecov-io commented Apr 20, 2018 •

edited

Loading

TedBrookings Apr 25, 2018 •

edited

Loading

SHuang-Broad Apr 26, 2018

TedBrookings Apr 25, 2018

SHuang-Broad Apr 26, 2018

SHuang-Broad Apr 26, 2018

TedBrookings Apr 25, 2018

SHuang-Broad Apr 26, 2018

TedBrookings left a comment

SHuang-Broad commented Apr 26, 2018

TedBrookings commented Apr 26, 2018

SHuang-Broad commented Apr 26, 2018


		final String chr = basicInfo.eventPrimaryChromosome;

		final Set<SimpleInterval> result = new HashSet<>(contigWithFineTunedAlignments.getAlignments().size());

fix 4649 #4677

fix 4649 #4677

Conversation

SHuang-Broad commented Apr 19, 2018

codecov-io commented Apr 20, 2018 • edited Loading

Codecov Report

TedBrookings Apr 25, 2018 • edited Loading

Choose a reason for hiding this comment

SHuang-Broad Apr 26, 2018

Choose a reason for hiding this comment

TedBrookings Apr 25, 2018

Choose a reason for hiding this comment

SHuang-Broad Apr 26, 2018

Choose a reason for hiding this comment

SHuang-Broad Apr 26, 2018

Choose a reason for hiding this comment

TedBrookings Apr 25, 2018

Choose a reason for hiding this comment

SHuang-Broad Apr 26, 2018

Choose a reason for hiding this comment

TedBrookings left a comment

Choose a reason for hiding this comment

SHuang-Broad commented Apr 26, 2018

TedBrookings commented Apr 26, 2018

SHuang-Broad commented Apr 26, 2018

codecov-io commented Apr 20, 2018 •

edited

Loading

TedBrookings Apr 25, 2018 •

edited

Loading