-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup of simple_imputer #346
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
61b1a29
cleanup of simple_imputer
eddiebergman bbabad8
Fixed doc and typo
eddiebergman c92d039
Fixed docs
eddiebergman 60b9194
Made changes, added test
eddiebergman 7a3e792
Fixed init statement
eddiebergman 5490604
Fixed docs
eddiebergman e790e71
Flake'd
eddiebergman File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better if we can use
Enum
.We did not have it so much just because I did not join the refactoring phase.
At least, I would like to reduce so many dependencies on hard-coded strings as much as possible.
Because this file also uses so many hard-coded strings, which we can avoid by enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that, you do however require to update these whenever sklearn updates them as well as that is where they are forwarded to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see why you want to do that but it makes the code that uses it a little less pretty. For example, it's hard to see where the error with the follow code is:
It should be
default_value=NumericalImputerChoice.mean.value
. The problem here is that Enum's are namespaced i.e.NumericalImputerChoice.mean == "NumericalImputerChoice.mean"
... where as we would like
NumericalImputerChoice.mean == "mean"
.This is more readily achievable with a
NamedTuple
, meaning no classes or anything are required. From my Java days, this was similar and in general, enum values should never be directly used, such as their string value, and rather more as flags.I will implement the Enum version and you can change it if you like or leave it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another clean'ish solution is just ditch the enum part.
The type is still a string, meaning it's easy to use, people don't need to know about the existence of this class to use it, they can still pass a string. It also allows for the internal code to use the class, where we do know about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As another note, it also makes parameters extremely long, it's 101 characters which is over the character count limit of the checker:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not changed it for now, there's too many decisions to make that I think can be addressed based on how you would like it done yourself. I've changed it back to strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for the insights. Yeah, I think such changes should be a part of a separate PR as this PR is meant to clean up the messy statements in this file. Also, it would be better if the hyperparameter strings in all the components are consistent so it would require changing a lot of files which I think is beyond the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the same thing, there is
Literal
at the moment but the type checking for it doesn't work as expected.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the problems come from the fact that we still need to follow python3.7.
Once we switched to python3.8, we can use
Literal
...Yeah, I agree with you. I am just uncomfortable using software that assumes that users google string choices (typically sklearn).
And I found it really useful to be able to use
enum
orclass
such as in numpy, facebook ax..But I got your point, let's stay for now.
I did not know some points you mentioned, thanks for raising them. @eddiebergman .
(just a question, but) the better solution will be something like this:
and use
str
for all the typings.Btw, you do not need to put
.value
(FYI).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, you can add
python
after the backticks ``` of a code block to get python highlighting ;)