Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add random forest option for partial dependence #13

Merged
merged 2 commits into from
Jul 3, 2023

Conversation

katiedagon
Copy link
Collaborator

@katiedagon katiedagon commented Jun 22, 2023

@djgagne @jsschreck Following our discussion on the MILES slack about the behavior of XGBoost sometimes generating negative values in partial dependence plots, I've attempted to update the code to add a random forest option. I set it up to use RF as the default, but that could be changed. I tested this with an existing echo run, and the PD plots are comparable to those produced with XGB, but still generating negative values which I'm not sure makes sense for these metrics. Here are some example comparisons using a set of metrics from @mariajmolina - noting the vcorr_cust metric is likely out of date, but the valid_mae and vmse_extreme_outp should be relevant (and I believe should not have negative values).

I welcome comments/suggestions on any of the above, plus how best to let users choose between the two options, in addition to general comments about how the RF model is configured (e.g., hyperparameters).

Copy link
Collaborator

@djgagne djgagne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@katiedagon Thank you for adding the random forest functionality. I added a couple of small suggestions that should address the remaining issues with negative output scores.

echo/src/partial_dependence.py Outdated Show resolved Hide resolved
@@ -27,6 +28,7 @@ def partial_dep(fn, input_cols, output_col, verbose=0):
df[param] = le.fit_transform(df[param])

objective = "reg:squarederror"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hard coding the model hyperparameters, I would put them in a dictionary called hparams and make that a keyword argument for partial_dep.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good - where should I define hparams?

Copy link
Collaborator

@djgagne djgagne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. Merging.

@djgagne djgagne merged commit 383278e into NCAR:main Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants