-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix UMAP and simplicial set functions metric #5490
Fix UMAP and simplicial set functions metric #5490
Conversation
b8c3d6b
to
0bd5c77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some suggestions, but my biggest concern is whether our current tests are sufficiently capturing the motivating bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@viclafargue While this fix changes the behavior of the estimator class, I would consider the previous one broken and we are now moving towards the intended behavior and thus would not consider this a breaking change. What do you think? |
cpp/src/umap/knn_graph/algo.cuh
Outdated
@@ -62,6 +62,7 @@ inline void launcher(const raft::handle_t& handle, | |||
ptrs[0] = inputsA.X; | |||
sizes[0] = inputsA.n; | |||
|
|||
std::vector<int64_t>* translations = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to introduce a temporary for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function template won't be instantiated while providing a nullptr
directly unless a cast is used it seems. I just switched it for a cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
"correlation": DistanceType.CorrelationExpanded, | ||
"hellinger": DistanceType.HellingerExpanded, | ||
"hamming": DistanceType.HammingUnexpanded, | ||
"jaccard": DistanceType.JaccardExpanded, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jaccard is supported in the sparse distances- is there any reason we're not separating the sparse from dense supported metrics? I can't see why we'd want to remove jaccard from being executed on sparse metrics just because it's not yet provided for dense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cjnolet This should be addressed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, didn't saw it was used in the sparse case. Thanks for fixing this @csadorf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meant to request changes for the comment above. We should avoid removing features.
And add note about jaccard only supported for sparse inputs.
"jaccard", | ||
"hamming", | ||
"canberra", | ||
("l2", True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should only have to do the mappings once- if you use strings in SPARSE_SUPPORTED_METRICS
and DENSE_SUPPORTED_METRICS
then you can literally just use the union of the two here here instead of having to list them out at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulling in those lists into the test code would be counter-productive IMO since it correlates implementation and test expectation which means that it becomes harder to detect breaking changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it could be cleaned up a bit, but it's not an urgent issue so long as the rules aren't hardcoded and we aren't losing the jaccard functionality.
/merge |
Answers #5422