Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] GPU Enabled Random Forest Regressor documentation is wrong. #4623

Closed
LaneMatthewJ opened this issue Mar 8, 2022 · 5 comments · Fixed by #4666
Closed

[DOC] GPU Enabled Random Forest Regressor documentation is wrong. #4623

LaneMatthewJ opened this issue Mar 8, 2022 · 5 comments · Fixed by #4666
Labels
doc Documentation

Comments

@LaneMatthewJ
Copy link

LaneMatthewJ commented Mar 8, 2022

Report incorrect documentation

Hello!

In working with cuML Random Forest Regressor (both single and multi-gpu) I found some errors in the documentation (for the current stable build*) and have noted them below!

Location of incorrect documentation

Problem 1: Run Time Error with max_depth = -1:

In the Dask Random Forest Regressor documentation there exists a subsection for max_detph suggesting that the regressor can have an unlimited max depth with -1 as a parameter.

Problem 2: Sytax Error with assignment in dictionary:

Under fit of both Random Forest Classifier and Random Forest Regressor there exists a syntax error:

This is equivalent to calling persist with the data and workers):

X_dask_cudf, y_dask_cudf = dask_client.persist([X_dask_cudf,
                                                y_dask_cudf],
                                               workers={
                                               X_dask_cudf=workers,
                                               y_dask_cudf=workers
                                               })

Describe the problems or issues found in the documentation

Problem 1

When attempting to use -1 as a parameter for max depth

rf = dask_cuRF(client=client, 
               n_streams = 1, 
               split_criterion=2, 
               max_features='sqrt', 
               min_samples_split=6, 
               n_estimators=800, 
               max_depth=-1, 
               n_bins=75,
               ignore_empty_partitions=True)

rf_trained = rf.fit(X_cudf_train_dask, y_cudf_train_dask)

The following error is returned:

RuntimeError: 2 of 2 worker jobs failed: Must specify max_depth >0 , Must specify max_depth >0 

Problem 2:

Syntax errors won't run.

Steps taken to verify documentation is incorrect

Problem 1:

Run Time Error occurs.

Problem 2:

Syntax Error will not run

*Suggested fix for documentation

Problem 1:

Update documentation to denote that there exists no unlimited depth option

Problem 2:

Update documentation to have no syntax errors.


Report needed documentation

Report needed documentation
I think the documentation ought to match the implementation.

Describe the documentation you'd like
Updated documentation that matches the implementation

Steps taken to search for needed documentation
N/A

@LaneMatthewJ LaneMatthewJ added ? - Needs Triage Need team to review and classify doc Documentation labels Mar 8, 2022
@divyegala
Copy link
Member

@vinaydes @venkywonka could I ask one of you to check this out?

@venkywonka
Copy link
Contributor

Thank you @LaneMatthewJ for the issue! Apologies for the delay and the improper documentation, have fixed it in the above linked PR 😄

@divyegala divyegala removed the ? - Needs Triage Need team to review and classify label Mar 29, 2022
@LaneMatthewJ
Copy link
Author

Awesome! Thank you so much!

rapids-bot bot pushed a commit that referenced this issue Mar 31, 2022
This small PR :
* Rectifies/clarifies the wrong parameter description for `max_depth` parameter in `RandomForestRegressor` and `RandomForestClassifier` in RF and dask-RF. 
* Corrects a syntax error in dask-rf docs
* Corrects the `max_depth` check in RF-python layer

resolves #4623

Authors:
  - Venkat (https://github.com/venkywonka)

Approvers:
  - Rory Mitchell (https://github.com/RAMitchell)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #4666
@LaneMatthewJ
Copy link
Author

Hey @venkywonka! Out of curiosity does the documentation on the site only get built during the pushing of a new version? (also - thank you again!)

https://docs.rapids.ai/api/cuml/stable/api.html?highlight=random%20forest%20regressor#cuml.dask.ensemble.RandomForestRegressor

@venkywonka
Copy link
Contributor

Hi, so sorry for the delay in response! If I understand correctly, every merge to the main branch is updated in the nightly docs but the URL you posted above is that of the stable docs that usually corresponds to one version previous to the default branch (ie, 22.06 as current default branch is 22.08).

vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this issue Oct 9, 2023
This small PR :
* Rectifies/clarifies the wrong parameter description for `max_depth` parameter in `RandomForestRegressor` and `RandomForestClassifier` in RF and dask-RF. 
* Corrects a syntax error in dask-rf docs
* Corrects the `max_depth` check in RF-python layer

resolves rapidsai#4623

Authors:
  - Venkat (https://github.com/venkywonka)

Approvers:
  - Rory Mitchell (https://github.com/RAMitchell)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4666
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants