-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix 4649 #4677
fix 4649 #4677
Conversation
92a2262
to
ae67721
Compare
Codecov Report
@@ Coverage Diff @@
## master #4677 +/- ##
==============================================
+ Coverage 79.894% 79.924% +0.03%
+ Complexity 17388 17348 -40
==============================================
Files 1080 1074 -6
Lines 63091 62942 -149
Branches 10180 10186 +6
==============================================
- Hits 50406 50306 -100
+ Misses 8701 8652 -49
Partials 3984 3984
|
ae67721
to
792b81b
Compare
final SimpleInterval twoBaseSegment = new SimpleInterval(eventPrimaryChromosome, leftBoundary.getEnd(), rightBoundary.getStart()); | ||
if ( twoBaseBoundaries.contains(twoBaseSegment) ) { | ||
segments.add(twoBaseSegment); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming I'm correct about the above being a typo, you might decrease scope for this kind of error be refactoring thusly:
final int start = rightBoundary.getStart();
final int end = leftBoundary.getEnd();
final int segmentLength = end - start;
if( segmentLength >= 1) {
final SimpleInterval newSegment = new SimpleInterval(eventPrimaryChromosome, start, end);
if(segmentLength > 1 || twoBaseBoundaries.contains(newSegment)) {
segments.add(newSegment);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean, but like I replied in the other place, it's a "point" location so won't matter in this case.
And the suggestion you made is good, except that end is always >= start because it is from the "left" (I know, so many mind-numbing details...)
@@ -168,8 +171,14 @@ static SimpleInterval getAffectedReferenceRegion(final List<SimpleInterval> even | |||
// there shouldn't be a segment constructed if two segmenting locations are adjacent to each other on the reference | |||
// this could happen when (in the simplest case), two alignments are separated by a mapped insertion (hence 3 total alignments), | |||
// and the two alignments' ref span are connected | |||
// more generally: only segment when the two segmenting locations are boundaries of alignments that overlap each other (ref. span) | |||
if (rightBoundary.getStart() - leftBoundary.getEnd() > 1) { | |||
segments.add(new SimpleInterval(eventPrimaryChromosome, leftBoundary.getStart(), rightBoundary.getStart())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the new SimpleInterval use leftBoundary.getEnd() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually doesn't matter, because the boundaries are 1bp "point" locations.
I'll make it clear in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also changed to getEnd()
|
||
final String chr = basicInfo.eventPrimaryChromosome; | ||
|
||
final Set<SimpleInterval> result = new HashSet<>(contigWithFineTunedAlignments.getAlignments().size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be 2 * contigWithFineTunedAlignments.getAlignments().size(), since you get up to two SimpleIntervals per alignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep and done as suggested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found one typo that must be fixed, however it is quite simple to fix so I'll just approve now.
The test code is nigh impenetrable (to me). Partly that's because I'm just looking at this part of the pipeline for the first time, partly it's because you're addressing an inherently complex problem, and partly it's all those "manually curated values". I have no idea if it's feasible or totally crazy, but it might be good to have an adversarial function that generates problems from known solutions rather than doing it by hand.
* better exception message and fix bug in existing cpx variant test data (SV) new cpx variant scenario commit: * optionally add reference segments when the would be segment is 2-bp long (before all such segments are skipped)
792b81b
to
015ed5a
Compare
It's always difficult to come up with test data for these complex events. What I usually do is to create test data on what I know should work, including imaginable edge cases, and accumulate more edge cases as we run into them. Can you expand a little on what you mean by
|
Like I said, I don't know enough to say if this approach is feasible, but sometimes it's possible to test an algorithm with a problem-generating function. This works well when it's easy to generate self-consistent solutions and test data consistent with a given solution. I was able to do this for some of the interval overlap functions on the python side of my code. Other methods were too complicated (for me to figure out anyway). Let's say you have an algorithm function f_algo that you want to test. You design a function test_data_factory that takes as input either a known solution, or a set of parameters from which it can easily generate a known solution (by a route that is logically distinct from the way f_algo works). Then f_tester generates and returns test_data. The flow looks like
In some sense you've already done that, but you've personally stepped in as test_data_factory(), and I'm wondering if it would be possible to automate that to get rid of all those numbers in the test code. |
Hmm, I see what you mean. |
* better exception message and fix bug in existing cpx variant test data (SV) new cpx variant scenario commit: * optionally add reference segments when the would be segment is 2-bp long (before all such segments are skipped)
fix #4649
The cause of the exception is a new edge case that was not imagined when the reference region segmenting logic was initially written.
It is now covered, with updated tests.