Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API/ENH: unprotected Categorical #13506

Closed
adbull opened this issue Jun 23, 2016 · 2 comments
Closed

API/ENH: unprotected Categorical #13506

adbull opened this issue Jun 23, 2016 · 2 comments
Labels
Categorical Categorical Data Type Closing Candidate May be closeable, needs more eyeballs Enhancement

Comments

@adbull
Copy link
Contributor

adbull commented Jun 23, 2016

xref #8640 #12699 #13361 #13410

There's been discussion of a few overlapping uses of Categorical:

  1. as 'true' categorical data with a known set of values
  2. as 'lazy' categorical data which adds new categories as needed
  3. as an interned string data type, with no particular categorical interpretation

Option 1 is currently well-supported. Option 2 can be achieved explicitly by union_categorical, see #13361 #13410, but will not happen automatically e.g. when setting values or concatenating, see #12699. Option 3 is similar to 2, and has been discussed as a new String type, see #8640, but may have different semantics to 2.

While I completely agree that option 1 should be the default, I'd like to see more support for option 2 if possible; there are cases where I really do want to work with categorical data, but I don't yet know what the categories are, e.g. when concatenating a table together from several different source files.

One option would be to mimic Matlab's 'protected' flag for categorical data. By default, a Categorical would be created protected, and throw errors when performing actions which implicitly change the category set. However, the user could choose to declare a Categorical as unprotected, in which case these operations would be performed as intended.

While this behaviour could be achieved with the proposed String type, it's unclear whether that type would share the Categorical API, or be efficiently convertible to Categorical. Having the behaviour as part of Categorical would allow the user to build up a Categorical iteratively, from multiple sources, and then quickly mark it protected once the category set is known.

@jbrockmendel
Copy link
Member

Pre-duplicate of #20899?

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Dec 21, 2021
@mroeschke
Copy link
Member

The requirements here seem to be a subset of #20899, so closing in favor of that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Closing Candidate May be closeable, needs more eyeballs Enhancement
Projects
None yet
Development

No branches or pull requests

4 participants