Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

660/truncate feature values #661

Merged
merged 13 commits into from
Oct 6, 2023
Merged

Conversation

damien2012eng
Copy link
Contributor

I verified the ASAP dataset using RSMTool toolset. The column mechanics has outlier values. Here is an example result (Response 181) for truncating and w/o truncating:
Truncating outlier: 3.213 (Original data) and 3.248 (preprocessed data)
W/O Truncating outlier: 3.213 (for both original data and preprocessed data)
I also verified the outliers range for the Mechanics: [3.248, 5.648]. Because the value 3.213 is lower than the lower bound, it is truncated to be the value of the lower bound.

Closes #660

@codecov
Copy link

codecov bot commented Oct 4, 2023

Codecov Report

All modified lines are covered by tests ✅

Files Coverage Δ
rsmtool/preprocessor.py 96.44% <100.00%> (+0.02%) ⬆️
rsmtool/rsmexplain.py 92.19% <100.00%> (+0.03%) ⬆️
rsmtool/utils/constants.py 100.00% <ø> (ø)

📢 Thoughts on this report? Let us know!.

doc/config_rsmeval.rst.inc Outdated Show resolved Hide resolved
rsmtool/preprocessor.py Outdated Show resolved Hide resolved
rsmtool/preprocessor.py Outdated Show resolved Hide resolved
rsmtool/preprocessor.py Outdated Show resolved Hide resolved
@tamarl08
Copy link
Contributor

tamarl08 commented Oct 4, 2023

Looks good @damien2012eng ! just added a couple of docstring nitpicks.

Copy link
Member

@desilinguist desilinguist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@damien2012eng looks pretty good but it's still missing a critical piece. Do you remember how we check that the value of standardize_features specified for rsmexplain and in the original rsmtool experiment match and, if not, we raise a warning? We need to do this same for truncate_outliers as well.

Instead of duplicating code, may be factor out that code as a function in rsmexplain.py and use it for both fields?

doc/config_rsmeval.rst.inc Outdated Show resolved Hide resolved
doc/config_rsmexplain.rst.inc Outdated Show resolved Hide resolved
doc/config_rsmpredict.rst.inc Outdated Show resolved Hide resolved
doc/config_rsmtool.rst.inc Outdated Show resolved Hide resolved
rsmtool/preprocessor.py Outdated Show resolved Hide resolved
rsmtool/preprocessor.py Outdated Show resolved Hide resolved
rsmtool/preprocessor.py Outdated Show resolved Hide resolved
rsmtool/rsmexplain.py Outdated Show resolved Hide resolved
rsmtool/rsmexplain.py Outdated Show resolved Hide resolved
Use a simple `for` loop to verify and overwrite the values of
`standardize_features` and `truncate_outliers` for rsmexplain.
Add new tests to check that `truncate_outliers` values are correctly
overwritten when necessary.
@desilinguist
Copy link
Member

@damien2012eng don't forget to fix the file @tamarl08 pointed out. Otherwise this is ready to merge.

@damien2012eng damien2012eng merged commit e27474d into main Oct 6, 2023
10 checks passed
@delete-merged-branch delete-merged-branch bot deleted the 660/truncate_feature_values branch October 6, 2023 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature values get truncated when using rsmExplain on SKLL models
3 participants