Fix UMAP and simplicial set functions metric #5490

viclafargue · 2023-07-04T11:57:29Z

Answers #5422

csadorf

I have some suggestions, but my biggest concern is whether our current tests are sufficiently capturing the motivating bug.

python/cuml/manifold/simpl_set.pyx

python/cuml/tests/test_umap.py

python/cuml/manifold/simpl_set.pyx

csadorf

LGTM!

csadorf · 2023-07-27T19:18:12Z

@viclafargue While this fix changes the behavior of the estimator class, I would consider the previous one broken and we are now moving towards the intended behavior and thus would not consider this a breaking change. What do you think?

wphicks · 2023-07-28T22:30:26Z

cpp/src/umap/knn_graph/algo.cuh

@@ -62,6 +62,7 @@ inline void launcher(const raft::handle_t& handle,
  ptrs[0]  = inputsA.X;
  sizes[0] = inputsA.n;

+  std::vector<int64_t>* translations = nullptr;


Any reason to introduce a temporary for this?

The function template won't be instantiated while providing a nullptr directly unless a cast is used it seems. I just switched it for a cast.

wphicks

LGTM!

cjnolet · 2023-08-02T01:53:51Z

python/cuml/manifold/umap.pyx

-            "correlation": DistanceType.CorrelationExpanded,
-            "hellinger": DistanceType.HellingerExpanded,
-            "hamming": DistanceType.HammingUnexpanded,
-            "jaccard": DistanceType.JaccardExpanded,


Jaccard is supported in the sparse distances- is there any reason we're not separating the sparse from dense supported metrics? I can't see why we'd want to remove jaccard from being executed on sparse metrics just because it's not yet provided for dense.

@cjnolet This should be addressed now.

My bad, didn't saw it was used in the sparse case. Thanks for fixing this @csadorf.

cjnolet

Meant to request changes for the comment above. We should avoid removing features.

And add note about jaccard only supported for sparse inputs.

cjnolet · 2023-08-02T22:09:49Z

python/cuml/tests/test_umap.py

-        "jaccard",
-        "hamming",
-        "canberra",
+        ("l2", True),


We should only have to do the mappings once- if you use strings in SPARSE_SUPPORTED_METRICS and DENSE_SUPPORTED_METRICS then you can literally just use the union of the two here here instead of having to list them out at all.

Pulling in those lists into the test code would be counter-productive IMO since it correlates implementation and test expectation which means that it becomes harder to detect breaking changes.

cjnolet

I still think it could be cleaned up a bit, but it's not an urgent issue so long as the rules aren't hardcoded and we aren't losing the jaccard functionality.

csadorf · 2023-08-02T22:17:15Z

/merge

Fix UMAP metric

0bd5c77

viclafargue requested a review from a team as a code owner July 4, 2023 11:57

github-actions bot added the CUDA/C++ label Jul 4, 2023

viclafargue added the 3 - Ready for Review Ready for review by team label Jul 11, 2023

viclafargue force-pushed the fix-umap-metric branch from b8c3d6b to 0bd5c77 Compare July 12, 2023 14:24

dantegd and others added 2 commits July 12, 2023 23:11

Merge branch 'branch-23.08' into fix-umap-metric

1641209

fix metrics for simpl_set functions

77e207c

viclafargue requested a review from a team as a code owner July 13, 2023 13:22

github-actions bot added the Cython / Python Cython or Python issue label Jul 13, 2023

viclafargue changed the title ~~Fix UMAP metric~~ Fix UMAP and simplicial set functions metric Jul 13, 2023

viclafargue requested a review from cjnolet July 13, 2023 15:05

removing jaccard metric from tests

f8bfcdd

csadorf requested changes Jul 13, 2023

View reviewed changes

viclafargue added 3 commits July 14, 2023 11:43

addressing review

9f37c8f

small fix

90d9151

fix issue

8f2d6ff

csadorf added the improvement Improvement / enhancement to an existing function label Jul 14, 2023

csadorf requested changes Jul 21, 2023

View reviewed changes

python/cuml/manifold/simpl_set.pyx Show resolved Hide resolved

python/cuml/manifold/simpl_set.pyx Outdated Show resolved Hide resolved

viclafargue added 2 commits July 24, 2023 16:12

adressing review

84490c4

Merge branch 'branch-23.08' into fix-umap-metric

71dd03b

csadorf approved these changes Jul 27, 2023

View reviewed changes

viclafargue added non-breaking Non-breaking change bug Something isn't working and removed improvement Improvement / enhancement to an existing function labels Jul 28, 2023

wphicks reviewed Jul 28, 2023

View reviewed changes

viclafargue added 2 commits July 31, 2023 11:55

use cast

a96bb61

Merge branch 'branch-23.08' into fix-umap-metric

a7d5614

wphicks approved these changes Jul 31, 2023

View reviewed changes

Merge branch 'branch-23.08' into fix-umap-metric

caeeb87

cjnolet reviewed Aug 2, 2023

View reviewed changes

cjnolet requested changes Aug 2, 2023

View reviewed changes

csadorf added 2 commits August 2, 2023 14:51

Distinguish between metrics supported for dense and sparse inputs.

b23af4b

Revise doc-strings to include jaccard.

fc42da9

And add note about jaccard only supported for sparse inputs.

csadorf requested a review from cjnolet August 2, 2023 22:00

cjnolet reviewed Aug 2, 2023

View reviewed changes

cjnolet approved these changes Aug 2, 2023

View reviewed changes

rapids-bot bot merged commit 6bf61ca into rapidsai:branch-23.08 Aug 3, 2023
49 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix UMAP and simplicial set functions metric #5490

Fix UMAP and simplicial set functions metric #5490

viclafargue commented Jul 4, 2023

csadorf left a comment

csadorf left a comment

csadorf commented Jul 27, 2023

wphicks Jul 28, 2023

viclafargue Jul 31, 2023

wphicks left a comment

cjnolet Aug 2, 2023

csadorf Aug 2, 2023

viclafargue Aug 3, 2023

cjnolet left a comment

cjnolet Aug 2, 2023

csadorf Aug 2, 2023

cjnolet left a comment

csadorf commented Aug 2, 2023

Fix UMAP and simplicial set functions metric #5490

Fix UMAP and simplicial set functions metric #5490

Conversation

viclafargue commented Jul 4, 2023

csadorf left a comment

Choose a reason for hiding this comment

csadorf left a comment

Choose a reason for hiding this comment

csadorf commented Jul 27, 2023

wphicks Jul 28, 2023

Choose a reason for hiding this comment

viclafargue Jul 31, 2023

Choose a reason for hiding this comment

wphicks left a comment

Choose a reason for hiding this comment

cjnolet Aug 2, 2023

Choose a reason for hiding this comment

csadorf Aug 2, 2023

Choose a reason for hiding this comment

viclafargue Aug 3, 2023

Choose a reason for hiding this comment

cjnolet left a comment

Choose a reason for hiding this comment

cjnolet Aug 2, 2023

Choose a reason for hiding this comment

csadorf Aug 2, 2023

Choose a reason for hiding this comment

cjnolet left a comment

Choose a reason for hiding this comment

csadorf commented Aug 2, 2023