Fix merge conflicts [skip ci] #3892

ajschmidt8 · 2021-05-24T16:16:34Z

This PR fixes the merge conflicts in #3887.

@dantegd

This should resolve the confusion caused in the issue rapidsai/raft#228. Tagging @dantegd for review. Authors: - Thejaswi. N. S (https://github.com/teju85) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: #3875

Use floating rounding to make UMAP optimization deterministic. This is a breaking change as the batch size parameter is removed. * Add procedure for rounding the gradient updates. * Add buffer for gradient updates. * Add an internal parameter `deterministic`, which should be set to `true` when `random_state` is set. The test file is removed due to #3849 . Authors: - Jiaming Yuan (https://github.com/trivialfis) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #3848

This PR updates the `0.20` references in `CHANGELOG.md` to be `21.06`. Authors: - AJ Schmidt (https://github.com/ajschmidt8) Approvers: - Dillon Cullinan (https://github.com/dillon-cullinan) URL: #3883

I made the mistake and got a segmentation fault. A value error might be nicer. Authors: - Jiaming Yuan (https://github.com/trivialfis) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #3881

…istances from self-loops (#3824) Closes #3801 Closes #3802 Corresponding RAFT PR: rapidsai/raft#217 Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: #3824

This PR rewrites the mean squared error objective. Mean squared error is much easier when factored mathematically into a slightly different form. This should bring regression performance in line with classification. I've also removed the MAE objective as its not correct. This can be seen from the fact that leaf predictions with MAE use the mean, where the correct minimiser is the median. Also see sklearns implementation, where streaming median calculations are required: https://github.com/scikit-learn/scikit-learn/blob/de1262c35e2aa4ee062d050281ee576ce9e35c94/sklearn/tree/_criterion.pyx#L976. Implementing this correctly for GPU would be very challenging. Performance before: ![rf_regression_perf](https://user-images.githubusercontent.com/7307640/117608125-8c884280-b1b1-11eb-8cb4-e92f39dad0f3.png) After: ![rf_regression_perf_fix](https://user-images.githubusercontent.com/7307640/117608145-94e07d80-b1b1-11eb-939f-b96cafbd3e35.png) Script: ```python from cuml import RandomForestRegressor as cuRF from sklearn.ensemble import RandomForestRegressor as sklRF from sklearn.datasets import make_regression from sklearn.metrics import mean_squared_error import numpy as np import pandas as pd import matplotlib import matplotlib.pyplot as plt import seaborn as sns import time matplotlib.use("Agg") sns.set() X, y = make_regression(n_samples=100000, random_state=0) X = X.astype(np.float32) y = y.astype(np.float32) rs = np.random.RandomState(92) df = pd.DataFrame(columns=["algorithm", "Time(s)", "MSE"]) d = 10 n_repeats = 5 bootstrap = False max_samples = 1.0 max_features = 0.5 n_estimators = 10 n_bins = min(X.shape[0], 128) for _ in range(n_repeats): clf = sklRF( n_estimators=n_estimators, max_depth=d, random_state=rs, max_features=max_features, bootstrap=bootstrap, max_samples=max_samples if max_samples < 1.0 else None, ) start = time.perf_counter() clf.fit(X, y) skl_time = time.perf_counter() - start pred = clf.predict(X) cu_clf = cuRF( n_estimators=n_estimators, max_depth=d, random_state=rs.randint(0, 1 << 32), n_bins=n_bins, max_features=max_features, bootstrap=bootstrap, max_samples=max_samples, use_experimental_backend=True, ) start = time.perf_counter() cu_clf.fit(X, y) cu_time = time.perf_counter() - start cu_pred = cu_clf.predict(X, predict_model="CPU") df = df.append( { "algorithm": "cuml", "Time(s)": cu_time, "MSE": mean_squared_error(y, cu_pred), }, ignore_index=True, ) df = df.append( { "algorithm": "sklearn", "Time(s)": skl_time, "MSE": mean_squared_error(y, pred), }, ignore_index=True, ) print(df) fig, ax = plt.subplots(1, 2) sns.barplot(data=df, x="algorithm", y="Time(s)", ax=ax[0]) sns.barplot(data=df, x="algorithm", y="MSE", ax=ax[1]) plt.savefig("rf_regression_perf_fix.png") ``` Authors: - Rory Mitchell (https://github.com/RAMitchell) Approvers: - Philip Hyunsu Cho (https://github.com/hcho3) - Thejaswi. N. S (https://github.com/teju85) - John Zedlewski (https://github.com/JohnZed) URL: #3845

See #3820 Authors: - Micka (https://github.com/lowener) Approvers: - Victor Lafargue (https://github.com/viclafargue) - Dante Gama Dessavre (https://github.com/dantegd) URL: #3831

See #3820 Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Victor Lafargue (https://github.com/viclafargue) - Dante Gama Dessavre (https://github.com/dantegd) URL: #3830

…8-merge-21.06

ajschmidt8 · 2021-05-24T17:10:15Z

branch-21.08 doesn't contain any new commits (besides changelog updates), so I'll admin merge this since it's all already been tested on branch-21.06.

…1.06 Fix merge conflicts [skip ci]

teju85 and others added 9 commits May 19, 2021 22:12

Update CHANGELOG.md links for calver (#3883)

0b33f9d

This PR updates the `0.20` references in `CHANGELOG.md` to be `21.06`. Authors: - AJ Schmidt (https://github.com/ajschmidt8) Approvers: - Dillon Cullinan (https://github.com/dillon-cullinan) URL: #3883

Make sure __init__ is called in graph callback. (#3881)

b7a634a

I made the mistake and got a segmentation fault. A value error might be nicer. Authors: - Jiaming Yuan (https://github.com/trivialfis) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #3881

Fix for MNMG test_rf_classification_dask_fil_predict_proba (#3831)

ea662e8

See #3820 Authors: - Micka (https://github.com/lowener) Approvers: - Victor Lafargue (https://github.com/viclafargue) - Dante Gama Dessavre (https://github.com/dantegd) URL: #3831

Fix MNMG test test_rf_regression_dask_fil (#3830)

3e89f04

See #3820 Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Victor Lafargue (https://github.com/viclafargue) - Dante Gama Dessavre (https://github.com/dantegd) URL: #3830

Merge remote-tracking branch 'upstream/branch-21.06' into branch-21.0…

b52db0b

…8-merge-21.06

ajschmidt8 added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 24, 2021

ajschmidt8 requested a review from a team as a code owner May 24, 2021 16:16

github-actions bot added the conda conda issue label May 24, 2021

ajschmidt8 changed the base branch from branch-21.06 to branch-21.08 May 24, 2021 16:17

ajschmidt8 requested review from a team as code owners May 24, 2021 16:17

dillon-cullinan approved these changes May 24, 2021

View reviewed changes

ajschmidt8 merged commit 67661e7 into rapidsai:branch-21.08 May 24, 2021

ajschmidt8 deleted the branch-21.08-merge-21.06 branch May 24, 2021 17:10

vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023

Merge pull request rapidsai#3892 from ajschmidt8/branch-21.08-merge-2…

3fe9cbb

…1.06 Fix merge conflicts [skip ci]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix merge conflicts [skip ci] #3892

Fix merge conflicts [skip ci] #3892

ajschmidt8 commented May 24, 2021

ajschmidt8 commented May 24, 2021

Fix merge conflicts [skip ci] #3892

Fix merge conflicts [skip ci] #3892

Conversation

ajschmidt8 commented May 24, 2021

ajschmidt8 commented May 24, 2021