[REVIEW] Add slow high-precision mode to KNN #3304

wphicks · 2020-12-14T22:47:54Z

Provide mode to perform a second high-precision pass over results returned from brute-force KNN searches which make use of L2-derived metrics. This provides a workaround for issues with numerical instability in L2 distance calculations in FAISS when a query vector is quite close to multiple retrieved samples relative to the typical inter-sample distance.

Resolve #3195.

Provide mode to perform a second high-precision pass over results returned from brute-force KNN searches which make use of L2-derived metrics. This provides a workaround for issues with numerical instability in L2 distance calculations in FAISS when a query vector is quite close to multiple retrieved samples relative to the typical inter-sample distance.

wphicks · 2020-12-15T17:34:18Z

Additional detail on the data that first demonstrated the necessity of this new flag is available here: #3195 (comment).

mdemoret-nv

Minor suggestions on comments and array output_type handling. One thing I would like to see is some before/after testing added. For example, running once with two_pass_precision=False, then again with two_pass_precision=True and comparing that the output changed. This will help prove that the fix is working as intended.

python/cuml/neighbors/nearest_neighbors.pyx

python/cuml/test/test_nearest_neighbors.py

wphicks · 2020-12-17T02:08:08Z

One thing I would like to see is some before/after testing added. For example, running once with two_pass_precision=False, then again with two_pass_precision=True and comparing that the output changed. This will help prove that the fix is working as intended.

Sadly, this is not possible to write in an environment-neutral way. Because of how the errors propagate (or not) in the distance approximations, we've seen environments where this issue never comes up and environments where it occurs every time. Poor @cjnolet slogged away at this one for awhile but was unlucky (lucky?) enough to be on a system where it never came up. Even more illustratively, the PR that I used to test for this issue in CI passed with no problem before the fix was in, even though I was consistently seeing local failures.

codecov-io · 2020-12-17T05:19:39Z

Codecov Report

Merging #3304 (fc479b7) into branch-0.18 (550121b) will increase coverage by 0.17%.
The diff coverage is 85.11%.

@@               Coverage Diff               @@
##           branch-0.18    #3304      +/-   ##
===============================================
+ Coverage        71.48%   71.66%   +0.17%     
===============================================
  Files              207      210       +3     
  Lines            16748    16945     +197     
===============================================
+ Hits             11973    12144     +171     
- Misses            4775     4801      +26

Impacted Files	Coverage Δ
python/cuml/decomposition/incremental_pca.py	`94.70% <ø> (ø)`
python/cuml/dask/ensemble/base.py	`19.69% <30.43%> (+0.36%)`	⬆️
python/cuml/dask/cluster/kmeans.py	`54.00% <33.33%> (ø)`
python/cuml/ensemble/randomforestregressor.pyx	`70.83% <44.44%> (ø)`
python/cuml/dask/decomposition/base.py	`39.53% <50.00%> (ø)`
...ython/cuml/dask/ensemble/randomforestclassifier.py	`30.00% <50.00%> (+0.51%)`	⬆️
python/cuml/dask/ensemble/randomforestregressor.py	`35.08% <50.00%> (+0.54%)`	⬆️
python/cuml/dask/linear_model/linear_regression.py	`59.09% <50.00%> (ø)`
python/cuml/dask/linear_model/ridge.py	`50.00% <50.00%> (ø)`
...ython/cuml/dask/neighbors/kneighbors_classifier.py	`22.33% <50.00%> (ø)`
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6b5e7ff...fc479b7. Read the comment docs.

wphicks · 2021-01-07T20:41:29Z

To provide a little more clarity on my last comment, the unit tests introduced here consistently failed in my local environment before the included fix was provided and consistently passed after the fix was introduced. On CI and in other environments, that same unit tests would consistently pass even before the fix was introduced.

JohnZed

Looks good - just one question/suggestion

python/cuml/neighbors/nearest_neighbors.pyx

wphicks · 2021-01-12T15:07:12Z

rerun tests

wphicks · 2021-01-14T16:19:25Z

Merging in latest mainline to see if that will fix seemingly unrelated CI errors

wphicks · 2021-01-19T16:27:33Z

rerun tests

wphicks · 2021-01-19T22:05:55Z

rerun tests

wphicks · 2021-01-20T15:58:31Z

Seems to have been an unrelated error in FAISS. Rerunning tests and will check on specifics.

wphicks · 2021-01-20T15:58:42Z

rerun tests

ajschmidt8 · 2021-01-20T19:32:14Z

@JohnZed, I'll be removing the auto-merge labels from all repos shortly. Please make sure to use the new merge comment, @gpucibot merge when you're ready to merge this PR.

JohnZed · 2021-01-21T17:16:16Z

rerun tests

wphicks · 2021-01-22T20:41:37Z

Just updated copyright headers

wphicks · 2021-01-23T00:53:20Z

Merged in branch-0.18 to deal with FAISS error

mdemoret-nv

Everything LGTM

wphicks added feature request New feature or request 2 - In Progress Currenty a work in progress non-breaking Non-breaking change labels Dec 14, 2020

wphicks added 2 commits December 15, 2020 11:49

Correct axes selection and variable name

fc9f34d

Fix style

f6285e2

wphicks added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currenty a work in progress labels Dec 15, 2020

wphicks marked this pull request as ready for review December 15, 2020 16:57

wphicks requested a review from a team as a code owner December 15, 2020 16:57

wphicks changed the title ~~[WIP] Add slow high-precision mode to KNN~~ [REVIEW] Add slow high-precision mode to KNN Dec 15, 2020

wphicks requested a review from cjnolet December 15, 2020 16:57

wphicks mentioned this pull request Dec 15, 2020

Anomalous behavior in NearestNeighbors #3195

Closed

mdemoret-nv requested changes Dec 15, 2020

View reviewed changes

Clarify documentation and use input_to_cupy_array

5ff8630

wphicks added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 3 - Ready for Review Ready for review by team labels Dec 17, 2020

Fix style

54a469c

JohnZed requested changes Jan 11, 2021

View reviewed changes

python/cuml/neighbors/nearest_neighbors.pyx Outdated Show resolved Hide resolved

Use expanded L2 directly in high-precision mode

1e9f9f0

Merge branch 'branch-0.18' into fea-high_precision_knn

ce701ae

github-actions bot added the Cython / Python Cython or Python issue label Jan 14, 2021

JohnZed approved these changes Jan 14, 2021

View reviewed changes

JohnZed removed the 4 - Waiting on Reviewer Waiting for reviewer to review or respond label Jan 14, 2021

JohnZed added the 6 - Okay to Auto-Merge label Jan 14, 2021

ajschmidt8 removed the 6 - Okay to Auto-Merge label Jan 20, 2021

Update copyright headers

d6247b5

Merge branch 'branch-0.18' into fea-high_precision_knn

287c989

Merge branch 'branch-0.18' into fea-high_precision_knn

fc479b7

wphicks mentioned this pull request Jan 25, 2021

[BUG] Memory access error in IVFPQ unit test #3318

Closed

wphicks added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Jan 26, 2021

mdemoret-nv approved these changes Jan 26, 2021

View reviewed changes

dantegd merged commit 546abad into rapidsai:branch-0.18 Jan 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Add slow high-precision mode to KNN #3304

[REVIEW] Add slow high-precision mode to KNN #3304

wphicks commented Dec 14, 2020

wphicks commented Dec 15, 2020

mdemoret-nv left a comment

wphicks commented Dec 17, 2020

codecov-io commented Dec 17, 2020 •

edited

Loading

wphicks commented Jan 7, 2021

JohnZed left a comment

wphicks commented Jan 12, 2021

wphicks commented Jan 14, 2021

wphicks commented Jan 19, 2021

wphicks commented Jan 19, 2021

wphicks commented Jan 20, 2021

wphicks commented Jan 20, 2021

ajschmidt8 commented Jan 20, 2021

JohnZed commented Jan 21, 2021

wphicks commented Jan 22, 2021

wphicks commented Jan 23, 2021

mdemoret-nv left a comment

[REVIEW] Add slow high-precision mode to KNN #3304

[REVIEW] Add slow high-precision mode to KNN #3304

Conversation

wphicks commented Dec 14, 2020

wphicks commented Dec 15, 2020

mdemoret-nv left a comment

Choose a reason for hiding this comment

wphicks commented Dec 17, 2020

codecov-io commented Dec 17, 2020 • edited Loading

Codecov Report

wphicks commented Jan 7, 2021

JohnZed left a comment

Choose a reason for hiding this comment

wphicks commented Jan 12, 2021

wphicks commented Jan 14, 2021

wphicks commented Jan 19, 2021

wphicks commented Jan 19, 2021

wphicks commented Jan 20, 2021

wphicks commented Jan 20, 2021

ajschmidt8 commented Jan 20, 2021

JohnZed commented Jan 21, 2021

wphicks commented Jan 22, 2021

wphicks commented Jan 23, 2021

mdemoret-nv left a comment

Choose a reason for hiding this comment

codecov-io commented Dec 17, 2020 •

edited

Loading