Fix groupby head/tail for empty dataframe #13398

shwina · 2023-05-20T14:18:55Z

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

wence- · 2023-05-22T08:13:05Z

python/cudf/cudf/core/groupby/groupby.py

@@ -678,7 +678,7 @@ def _head_tail(self, n, *, take_head: bool, preserve_order: bool):
            # subsample the gather map from the full input ordering,
            # rather than permuting the gather map of the output.
            _, (ordering,), _ = self._groupby.groups(
-                [arange(0, self.obj._data.nrows)]
+                [arange(0, len(self.obj))]


The same error occurs (previously and after the fix for scans in #13389, which was me again sorry!). Although it's impossible to reach that code path right now because mimic pandas order for scans only happens if the dataframe is not empty.

But can you apply this patch too please?

diff --git a/python/cudf/cudf/core/groupby/groupby.py b/python/cudf/cudf/core/groupby/groupby.py index fb242a49ad..b3be6d9de0 100644 --- a/python/cudf/cudf/core/groupby/groupby.py +++ b/python/cudf/cudf/core/groupby/groupby.py @@ -2277,9 +2277,7 @@ class GroupBy(Serializable, Reducible, Scannable): # result coming back from libcudf has null_count few rows than # the input, so we must produce an ordering from the full # input range. - _, (ordering,), _ = self._groupby.groups( - [arange(0, self.obj._data.nrows)] - ) + _, (ordering,), _ = self._groupby.groups([arange(0, len(self.obj))]) if self._dropna and any( c.has_nulls(include_nan=True) > 0 for c in self.grouping._key_columns @@ -2287,7 +2285,7 @@ class GroupBy(Serializable, Reducible, Scannable): # Scan aggregations with null/nan keys put nulls in the # corresponding output rows in pandas, to do that here # expand the result by reindexing. - ri = cudf.RangeIndex(0, self.obj._data.nrows) + ri = cudf.RangeIndex(0, len(self.obj)) result.index = cudf.Index(ordering) # This reorders and expands result = result.reindex(ri)

…pby-head-tail-empty

shwina · 2023-05-22T17:52:33Z

/merge

Fix groupby head/tail for empty dataframe

8126ed3

github-actions bot added the Python Affects Python cuDF API. label May 20, 2023

wence- reviewed May 22, 2023

View reviewed changes

shwina added 2 commits May 22, 2023 10:55

Merge branch 'branch-23.06' of github.com:rapidsai/cudf into fix-grou…

eeb672b

…pby-head-tail-empty

Apply suggested patch

bce865d

shwina added bug Something isn't working non-breaking Non-breaking change labels May 22, 2023

galipremsagar approved these changes May 22, 2023

View reviewed changes

shwina marked this pull request as ready for review May 22, 2023 16:28

shwina requested a review from a team as a code owner May 22, 2023 16:28

shwina requested review from bdice and charlesbluca May 22, 2023 16:28

bdice approved these changes May 22, 2023

View reviewed changes

rapids-bot bot merged commit 9b1496d into rapidsai:branch-23.06 May 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix groupby head/tail for empty dataframe #13398

Fix groupby head/tail for empty dataframe #13398

shwina commented May 20, 2023

wence- May 22, 2023

shwina May 22, 2023

shwina commented May 22, 2023

Fix groupby head/tail for empty dataframe #13398

Fix groupby head/tail for empty dataframe #13398

Conversation

shwina commented May 20, 2023

Description

Checklist

wence- May 22, 2023

Choose a reason for hiding this comment

shwina May 22, 2023

Choose a reason for hiding this comment

shwina commented May 22, 2023