-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support category dtypes #24
Conversation
Hi David, thanks for identifying and fixing this issue. I have two suggestions on how I'd like this implemented. Firstly, it seems that the more elegant way to do this broad type checking is with the Secondly, I think we could improve your actual skip-empty behaviour for categoricals: validated = (series.astype(object).str.len() > 0) & simple_validation Currently you're converting the column to a string and checking for empty strings, but I don't believe this is the idiomatic way to treat categoricals. In theory the user might actually have the empty string as one of their categories, and so we don't want to ignore a column in this case. On the other hand, the documentation says "All values of categorical data are either in categories or np.nan", which makes me think that, like with numericals, we should treat only |
Cool, will implement these changes. For the dtype check, are you thinking we should use the pandas type checks for all checks? i.e.
or just for the categorical check? |
Might as well do it in both cases. But also make sure to bump the pandas version in the setup.py also. |
I'm not seeing a pandas version in the setup.py. Seems it just installing the latest? |
Yes, exactly. Which is why it needs a constraint |
pandas version 0.21 is required for Series.isna() and 0.19 is required for is_categorical_dtype and is_numeric_dtype
Allright, I think I've got this all wrapped up now. Ended up having to use pandas>=0.21 for the |
Hmm, unless I use that method elsewhere, you might as well change it to |
OK, working with pandas 0.19! |
For some reason the comments on this PR are completely out of order, I've submitted a ticket to Github. Very confusing... |
Excellent, looks good! I tweaked some of your docstrings, but nothing major.
Yep I also noticed that. Never seen it before... |
This is now available as a GitHub release and on PyPi as 0.3.4 |
Resolves #22