API: Index.append behaviour with categoricals #14586
Labels
API Design
Categorical
Categorical Data Type
Enhancement
Index
Related to the Index class or subclasses
Reshaping
Concat, Merge/Join, Stack/Unstack, Explode
Milestone
Follow-up of #14545.
We had a long discussion on what the behaviour of
concat
should be when you have categorical data: #13767. In the end, for 0.19.0, we changed the behaviour of raising an error when categories didn't match to returning object dtyped data (only data with identical categories and ordered attributed gives a categorical as result). The table below is a summary of the changes between 0.18.1 and 0.19.0:For categorical Series:
However, we didn't change behaviour of
append
for Indexes (the above append is for series):For
CategoricalIndex
:The last line, i.e. the case where the calling Index is not a CategoricalIndex, changed by accident in 0.19.0, and it is this that I corrected for in PR #14545 for 0.19.1.
Questions:
Index.append
as we now have forSeries.append
with categorical data? This means that the column in the table above becomes 'object' apart from the first row.pd.CategoricalIndex(['a', 'b', 'c']).append(pd.Index(['a']))
keeps working)Changing this to always return object dtype unless for categoricals with indentical categories is easy, but gives a few failures in our test suite. Namely, in some indexing tests (indexing a DataFrame with a CategoricalIndex) there are changes in behaviour because indexing with a non-existing value in the index was performed using
CategoricalIndex.append()
. But this we can workaround in the indexing code of course.cc @JanSchulz @sinhrks
The text was updated successfully, but these errors were encountered: