Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify the behavior of getNextReferenceVertex for non-ref paths #4889

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -225,8 +225,8 @@ public final V getNextReferenceVertex( final V v, final boolean allowNonRefPaths
final Set<E> blacklistedEdgeSet = blacklistedEdge.isPresent() ? Collections.singleton(blacklistedEdge.get()) : Collections.emptySet();

// if we got here, then we aren't on a reference path
final Optional<E> edge = outgoingEdges.stream().filter(e -> !blacklistedEdgeSet.contains(e)).findAny();
return edge.isPresent() ? getEdgeTarget(edge.get()) : null;
final List<E> edges = outgoingEdges.stream().filter(e -> !blacklistedEdgeSet.contains(e)).limit(2).collect(Collectors.toList());
return edges.size() == 1 ? getEdgeTarget(edges.get(0)) : null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting the first edge is still essentially random in that it depends on the order in which reads arrived. Why not be principled and traverse the edge with the greatest weight, which seems like a good heuristic if you're trying to arrive back at the reference?

final Optional<E> edge = outgoingEdges.stream().max(Comparator.comparingInt(BaseEdge::getMultiplicity).reversed())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misread the code. But I still think my suggestion to do a greedy search before giving up on finding the reference is reasonable and more robust to things like high depth where you might have lots of low-weight non-reference edges.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the current behavior is to pick a random edge. The PR reverts to the 3.x behavior, which is to return a non-reference edge iff there 1 non-reference edge.

It would be great to try a more exhaustive search. However, my intuition is that the we are unlikely to make it back to the reference when the paths start forking and the computational cost can be substantial. We can use this PR to revert back to the deterministic logic and save implementation (and evaluation) of a greedy search for a future issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidbenjamin thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DonFreed I'm fine with the 3.x behavior. Anything fancier can wait for a later thorough revision of the dangling end merging code.

}

/**
Expand Down