You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using a different name for the property (e.g. column_names)
Being able to set a single column df.columns[0] = 'foo' (the proposal don't allow it)
The return type of the columns (the proposal returns a Python list, pandas returns an Index)
Setting the column of a dataframe with one column with df.columns = 'foo' (the proposal requires an iterable, so df.columns = ['foo'] or equivalent is needed).
In case it's useful, this is the implementation of the examples:
importcollectionsimporttypingclassdataframe:
def__init__(self, data):
self._columns=list(data)
@propertydefcolumns(self) ->typing.List[str]:
returnself._columns@columns.setterdefcolumns(self, names: typing.Iterable[str]):
ifnotisinstance(names, collections.abc.Iterable) orisinstance(names, str):
raiseTypeError(f'Columns must be an iterable, not {type(names).__name__}')
names=list(names)
fornameinnames:
ifnotisinstance(name, str):
raiseTypeError(f'Column names must be str, {type(name).__name__} found')
iflen(names) !=len(self._columns):
raiseValueError(f'Expected {len(self._columns)} column names, found {len(names)}')
iflen(set(names)) !=len(self._columns):
duplicates=set(namefornameinnamesifnames.count(name) >1)
raiseValueError(f'Column names cannot be duplicated. Found duplicates: {", ".join(duplicates)}')
self._columns=names
The text was updated successfully, but these errors were encountered:
As pointed out in the meeting, the API in the description assumes the dataframe can be mutated in place (in this case the labels). This is something that has been discussed in #10, but it's still not decided which kind of API we want in terms of mutability. Those would be the main options:
The consensus ended up being: "no mutability". So we cannot set column names, and the interchange protocol has a simple column_names property at the dataframe level.
Regarding column names, the next proposal, similar to what pandas currently does, uses a
columns
property to set and get columns names.In #7, the preference is to restrict column names to string, and not allow duplicates.
The proposed API with an example is:
And the next cases would fail:
Some things that people may want to discuss:
column_names
)df.columns[0] = 'foo'
(the proposal don't allow it)df.columns = 'foo'
(the proposal requires an iterable, sodf.columns = ['foo']
or equivalent is needed).In case it's useful, this is the implementation of the examples:
The text was updated successfully, but these errors were encountered: