Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle dask groupby warning #6391

Merged
merged 3 commits into from
Oct 1, 2024
Merged

Handle dask groupby warning #6391

merged 3 commits into from
Oct 1, 2024

Conversation

maximlt
Copy link
Member

@maximlt maximlt commented Sep 27, 2024

So Pandas deprecated in version 2.2.0 and with a FutureWarning passing a length-1 list-like name to grouped_by.get_group() that is not a tuple (pandas-dev/pandas#54155).

Dask seems to call Pandas internally doing a a group-by operation followed by a get_group() call, so their users are also seeing this warning (dask/dask#10572).

The change worked locally (let's see what the CI says) but I'm not sure it'll work with all the versions of Pandas and Dask supported by HoloViews.

(Sorry for the hvPlot reproducer only!)

import hvplot.dask  # noqa

from hvplot.sample_data import airline_flights

flights = airline_flights.to_dask().persist()
flight_subset = flights[flights.carrier.isin(['OH', 'F9'])]
flight_subset.hvplot(x='distance', y='depdelay', by='carrier', kind='scatter', alpha=0.2, persist=True)
/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/dask/dataframe/groupby.py:270: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  return grouped.get_group(get_key)
/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/dask/dataframe/groupby.py:270: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  return grouped.get_group(get_key)

The full traceback when turning the warning into an error:

Traceback (most recent call last):
  File "/Users/mliquet/dev/hvplot/.mltmess/issue_dask_groupby.py", line 9, in <module>
    flight_subset.hvplot(x='distance', y='depdelay', by='carrier', kind='scatter', alpha=0.2, persist=True)
  File "/Users/mliquet/dev/hvplot/hvplot/plotting/core.py", line 95, in __call__
    return self._get_converter(x, y, kind, **kwds)(kind, x, y)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/hvplot/converter.py", line 1723, in __call__
    obj = method(x, y)
          ^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/hvplot/converter.py", line 2251, in scatter
    return self.chart(Scatter, x, y, data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/hvplot/converter.py", line 2200, in chart
    return self.single_chart(element, x, y, data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/hvplot/converter.py", line 2072, in single_chart
    Dataset(data, self.by + kdims, vdims).to(element, kdims, vdims, self.by),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/holoviews/core/data/__init__.py", line 145, in __call__
    group = selected.groupby(groupby, container_type=HoloMap,
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/holoviews/core/data/__init__.py", line 196, in pipelined_fn
    result = method_fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/holoviews/core/data/__init__.py", line 1000, in groupby
    return self.interface.groupby(self, dim_names, container_type,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/holoviews/core/data/dask.py", line 223, in groupby
    group = group_type(groupby.get_group(coord), **group_kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/dask_expr/_groupby.py", line 1639, in get_group
    return new_collection(GetGroup(self.obj.expr, key, self._slice, *self.by))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/dask_expr/_collection.py", line 4799, in new_collection
    meta = expr._meta
           ^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.8/Frameworks/Python.framework/Versions/3.11/lib/python3.11/functools.py", line 1001, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/dask_expr/_expr.py", line 496, in _meta
    return self.operation(*args, **self._kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/dask_expr/_groupby.py", line 1089, in groupby_get_group
    return _groupby_get_group(df, list(by_key), get_key, columns)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/dask/dataframe/groupby.py", line 270, in _groupby_get_group
    return grouped.get_group(get_key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1103, in get_group
    warnings.warn(
FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.

@hoxbro hoxbro added the type: compatibility Compability with upstream packages label Sep 27, 2024
@hoxbro hoxbro enabled auto-merge (squash) October 1, 2024 12:13
Copy link

codecov bot commented Oct 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.48%. Comparing base (6d36abe) to head (be00f98).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6391      +/-   ##
==========================================
- Coverage   88.48%   88.48%   -0.01%     
==========================================
  Files         323      323              
  Lines       68459    68460       +1     
==========================================
  Hits        60579    60579              
- Misses       7880     7881       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hoxbro hoxbro merged commit 519f720 into main Oct 1, 2024
14 checks passed
@hoxbro hoxbro deleted the handle_dask_groupby_length_1 branch October 1, 2024 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: compatibility Compability with upstream packages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants