You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When attempting to use the cumulative aggregation functions on a groupby of a frame or series with nulls (i.e. df.groupby("x").cumsum()), we get a ValueError.
Steps/Code to reproduce bug
Note we would see the same traceback if we used cumcount, cummax, or cummin:
Scan-based groupbys are massaged back into pandas (original dataframe)
order by a post-processing step. Previously, this did the wrong thing
if the grouping key contained null (or nan) keys. In this situation
dropna=True will cause libcudf to produce an output table that is
smaller than the input frame. To mimic pandas we need to expand this
output to the original frame size, inserting nulls in the missing rows
and reordering correctly.
Furthermore, the previous reordering code had an out-of-bounds memory
access when there were null keys, since we were asking to group and
column of the same length as a result, but the grouping object expects
columns of length of the original input (which is larger with
dropna=True and null keys).
To fix these issues, compute the reordering on a column of appropriate
size, and, if dropna is true and any of the key columns have nulls, go
down a more expensive reordering path that inserts nulls correctly by
reindexing the result.
- Closesrapidsai#13349
- Closesrapidsai#12055
Describe the bug
When attempting to use the cumulative aggregation functions on a groupby of a frame or series with nulls (i.e.
df.groupby("x").cumsum()
), we get aValueError
.Steps/Code to reproduce bug
Note we would see the same traceback if we used
cumcount
,cummax
, orcummin
:Expected behavior
I would expect these operations to succeed and give me something roughly similar to the output of pandas:
Environment overview (please complete the following information)
Environment details
Click here to see environment details
Additional context
Ran into this issue while adding null testing to dask-cudf's groupby tests in #10853.
The text was updated successfully, but these errors were encountered: