Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix merge conflicts [skip ci] #3892

Merged
merged 9 commits into from
May 24, 2021
Merged

Fix merge conflicts [skip ci] #3892

merged 9 commits into from
May 24, 2021

Conversation

ajschmidt8
Copy link
Member

This PR fixes the merge conflicts in #3887.

teju85 and others added 9 commits May 19, 2021 22:12
This should resolve the confusion caused in the issue rapidsai/raft#228. Tagging @dantegd for review.

Authors:
  - Thejaswi. N. S (https://github.com/teju85)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #3875
Use floating rounding to make UMAP optimization deterministic.  This is a breaking change as the batch size parameter is removed.

* Add procedure for rounding the gradient updates.
* Add buffer for gradient updates.
* Add an internal parameter `deterministic`, which should be set to `true` when `random_state` is set.

The test file is removed due to #3849 .

Authors:
  - Jiaming Yuan (https://github.com/trivialfis)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #3848
This PR updates the `0.20` references in `CHANGELOG.md` to be `21.06`.

Authors:
  - AJ Schmidt (https://github.com/ajschmidt8)

Approvers:
  - Dillon Cullinan (https://github.com/dillon-cullinan)

URL: #3883
I made the mistake and got a segmentation fault.  A value error might be nicer.

Authors:
  - Jiaming Yuan (https://github.com/trivialfis)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #3881
…istances from self-loops (#3824)

Closes #3801 
Closes #3802 

Corresponding RAFT PR: rapidsai/raft#217

Authors:
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #3824
This PR rewrites the mean squared error objective. Mean squared error is much easier when factored mathematically into a slightly different form. This should bring regression performance in line with classification.

I've also removed the MAE objective as its not correct. This can be seen from the fact that leaf predictions with MAE use the mean, where the correct minimiser is the median. Also see sklearns implementation, where streaming median calculations are required: https://github.com/scikit-learn/scikit-learn/blob/de1262c35e2aa4ee062d050281ee576ce9e35c94/sklearn/tree/_criterion.pyx#L976. 

Implementing this correctly for GPU would be very challenging.

Performance before:
![rf_regression_perf](https://user-images.githubusercontent.com/7307640/117608125-8c884280-b1b1-11eb-8cb4-e92f39dad0f3.png)
After:
![rf_regression_perf_fix](https://user-images.githubusercontent.com/7307640/117608145-94e07d80-b1b1-11eb-939f-b96cafbd3e35.png)

Script:
```python
from cuml import RandomForestRegressor as cuRF
from sklearn.ensemble import RandomForestRegressor as sklRF
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import time

matplotlib.use("Agg")
sns.set()

X, y = make_regression(n_samples=100000, random_state=0)
X = X.astype(np.float32)
y = y.astype(np.float32)
rs = np.random.RandomState(92)
df = pd.DataFrame(columns=["algorithm", "Time(s)", "MSE"])
d = 10
n_repeats = 5
bootstrap = False
max_samples = 1.0
max_features = 0.5
n_estimators = 10
n_bins = min(X.shape[0], 128)
for _ in range(n_repeats):
    clf = sklRF(
        n_estimators=n_estimators,
        max_depth=d,
        random_state=rs,
        max_features=max_features,
        bootstrap=bootstrap,
        max_samples=max_samples if max_samples < 1.0 else None,
    )

    start = time.perf_counter()
    clf.fit(X, y)
    skl_time = time.perf_counter() - start
    pred = clf.predict(X)
    cu_clf = cuRF(
        n_estimators=n_estimators,
        max_depth=d,
        random_state=rs.randint(0, 1 << 32),
        n_bins=n_bins,
        max_features=max_features,
        bootstrap=bootstrap,
        max_samples=max_samples,
        use_experimental_backend=True,
    )

    start = time.perf_counter()
    cu_clf.fit(X, y)
    cu_time = time.perf_counter() - start
    cu_pred = cu_clf.predict(X, predict_model="CPU")
    df = df.append(
        {
            "algorithm": "cuml",
            "Time(s)": cu_time,
            "MSE": mean_squared_error(y, cu_pred),
        },
        ignore_index=True,
    )
    df = df.append(
        {
            "algorithm": "sklearn",
            "Time(s)": skl_time,
            "MSE": mean_squared_error(y, pred),
        },
        ignore_index=True,
    )
print(df)
fig, ax = plt.subplots(1, 2)
sns.barplot(data=df, x="algorithm", y="Time(s)", ax=ax[0])
sns.barplot(data=df, x="algorithm", y="MSE", ax=ax[1])
plt.savefig("rf_regression_perf_fix.png")
```

Authors:
  - Rory Mitchell (https://github.com/RAMitchell)

Approvers:
  - Philip Hyunsu Cho (https://github.com/hcho3)
  - Thejaswi. N. S (https://github.com/teju85)
  - John Zedlewski (https://github.com/JohnZed)

URL: #3845
@ajschmidt8 ajschmidt8 added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 24, 2021
@ajschmidt8 ajschmidt8 requested a review from a team as a code owner May 24, 2021 16:16
@github-actions github-actions bot added the conda conda issue label May 24, 2021
@ajschmidt8 ajschmidt8 changed the base branch from branch-21.06 to branch-21.08 May 24, 2021 16:17
@ajschmidt8 ajschmidt8 requested review from a team as code owners May 24, 2021 16:17
@ajschmidt8
Copy link
Member Author

branch-21.08 doesn't contain any new commits (besides changelog updates), so I'll admin merge this since it's all already been tested on branch-21.06.

@ajschmidt8 ajschmidt8 merged commit 67661e7 into rapidsai:branch-21.08 May 24, 2021
@ajschmidt8 ajschmidt8 deleted the branch-21.08-merge-21.06 branch May 24, 2021 17:10
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conda conda issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants