-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
660/truncate feature values #661
Conversation
Codecov ReportAll modified lines are covered by tests ✅
📢 Thoughts on this report? Let us know!. |
3299a63
to
7217c05
Compare
Looks good @damien2012eng ! just added a couple of docstring nitpicks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@damien2012eng looks pretty good but it's still missing a critical piece. Do you remember how we check that the value of standardize_features
specified for rsmexplain
and in the original rsmtool
experiment match and, if not, we raise a warning? We need to do this same for truncate_outliers
as well.
Instead of duplicating code, may be factor out that code as a function in rsmexplain.py
and use it for both fields?
b301453
to
091fbfa
Compare
091fbfa
to
4d31f66
Compare
Use a simple `for` loop to verify and overwrite the values of `standardize_features` and `truncate_outliers` for rsmexplain.
Add new tests to check that `truncate_outliers` values are correctly overwritten when necessary.
@damien2012eng don't forget to fix the file @tamarl08 pointed out. Otherwise this is ready to merge. |
I verified the ASAP dataset using RSMTool toolset. The column mechanics has outlier values. Here is an example result (Response 181) for truncating and w/o truncating:
Truncating outlier: 3.213 (Original data) and 3.248 (preprocessed data)
W/O Truncating outlier: 3.213 (for both original data and preprocessed data)
I also verified the outliers range for the Mechanics: [3.248, 5.648]. Because the value 3.213 is lower than the lower bound, it is truncated to be the value of the lower bound.
Closes #660