Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OHE: allow encoding of specific, user desired categories #303

Open
solegalli opened this issue Aug 30, 2021 · 7 comments · May be fixed by #667
Open

OHE: allow encoding of specific, user desired categories #303

solegalli opened this issue Aug 30, 2021 · 7 comments · May be fixed by #667

Comments

@solegalli
Copy link
Collaborator

As per this thread, the user may want to encode certain categories, that may not be the most frequent.

@Morgan-Sell
Copy link
Collaborator

Hi @solegalli,

This looks interesting!

Are you envisioning that OHE allows the following functionality?

  • init has a param called variables_w_new_category that represents which variables may contain the new category(ies).
  • new_categories - a list of categories that should be encoded in the variables included in the variables_w_new_category
  • the new_categories values would be added as values to the respective keys listed in variables_w_new_category in self.encoder_dict_

@Morgan-Sell
Copy link
Collaborator

@solegalli, should this task be closed? It seems like task #403 resolved this issue

@solegalli
Copy link
Collaborator Author

#403 is in essence asking for the same functionality. That's probably why I closed it. I flagged it as duped now. Still open.

@Morgan-Sell
Copy link
Collaborator

@solegalli, resurrecting this issue ;)

When someone selects this functionality, do we want to limit the user to one variable?

I imagine that the user will select values that are specific to one variable. It seems odd for multiple categorical variables to have the same values.

@solegalli
Copy link
Collaborator Author

I think the most straight forward would be to add a new parameter, or perhaps even better, extend top_categories to take a dictionary with the variable as key and the categories to encode as values. Then, for each variable, the transformer will create dummies only for the categories indicated by the user.

Will you pick this one up?

@Morgan-Sell
Copy link
Collaborator

@solegalli,

I like the idea of using a dictionary. However, I'm unsure if the dictionary should be accepted by top_categories.

Would it be cleaner to have a separate param called custom_categories? We would check that both top_categories and custom_categories do not have values.

@solegalli
Copy link
Collaborator Author

Sounds good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants