Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dask.dataframe import error for Python 3.11.9 #11035

Merged
merged 2 commits into from
Apr 3, 2024

Conversation

rjzamora
Copy link
Member

@rjzamora rjzamora commented Apr 3, 2024

Using the 4/2/2024 release of Python 3.11.9, calling import dask.dataframe produces TypeError when dask tries to "bind" docstrings for the DatetimeAccessor properties. The problem is that 3.11.9 will throw a new TypeError when you try to inspect the arguments of a property. This PR updates the try/except block to include TypeError.

Error without this fix
>>> import dask.dataframe as dd
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/pyenv/versions/3.11.9/lib/python3.11/site-packages/dask/dataframe/__init__.py", line 98, in <module>
    from dask.dataframe import backends, dispatch, rolling
  File "/pyenv/versions/3.11.9/lib/python3.11/site-packages/dask/dataframe/backends.py", line 15, in <module>
    from dask.dataframe.core import DataFrame, Index, Scalar, Series, _Frame
  File "/pyenv/versions/3.11.9/lib/python3.11/site-packages/dask/dataframe/core.py", line 36, in <module>
    from dask.dataframe import methods
  File "/pyenv/versions/3.11.9/lib/python3.11/site-packages/dask/dataframe/methods.py", line 34, in <module>
    from dask.dataframe.utils import is_dataframe_like, is_index_like, is_series_like
  File "/pyenv/versions/3.11.9/lib/python3.11/site-packages/dask/dataframe/utils.py", line 20, in <module>
    from dask.dataframe import (  # noqa: F401 register pandas extension types
  File "/pyenv/versions/3.11.9/lib/python3.11/site-packages/dask/dataframe/_dtypes.py", line 9, in <module>
    from dask.dataframe.extensions import make_array_nonempty, make_scalar
  File "/pyenv/versions/3.11.9/lib/python3.11/site-packages/dask/dataframe/extensions.py", line 8, in <module>
    from dask.dataframe.accessor import (
  File "/pyenv/versions/3.11.9/lib/python3.11/site-packages/dask/dataframe/accessor.py", line 126, in <module>
    class DatetimeAccessor(Accessor):
  File "/pyenv/versions/3.11.9/lib/python3.11/site-packages/dask/dataframe/accessor.py", line 81, in __init_subclass__
    _bind_property(cls, pd_cls, attr, min_version)
  File "/pyenv/versions/3.11.9/lib/python3.11/site-packages/dask/dataframe/accessor.py", line 35, in _bind_property
    setattr(cls, attr, property(derived_from(pd_cls, version=min_version)(func)))
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.11.9/lib/python3.11/site-packages/dask/utils.py", line 981, in wrapper
    method.__doc__ = _derived_from(
                     ^^^^^^^^^^^^^^
  File "/pyenv/versions/3.11.9/lib/python3.11/site-packages/dask/utils.py", line 934, in _derived_from
    method_args = get_named_args(method)
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.11.9/lib/python3.11/site-packages/dask/utils.py", line 695, in get_named_args
    s = inspect.signature(func)
        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.11.9/lib/python3.11/inspect.py", line 3263, in signature
    return Signature.from_callable(obj, follow_wrapped=follow_wrapped,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.11.9/lib/python3.11/inspect.py", line 3011, in from_callable
    return _signature_from_callable(obj, sigcls=cls,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.11.9/lib/python3.11/inspect.py", line 2599, in _signature_from_callable
    call = _descriptor_get(call, obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.11.9/lib/python3.11/inspect.py", line 2432, in _descriptor_get
    return get(descriptor, obj, type(obj))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: descriptor '__call__' for 'type' objects doesn't apply to a 'property' object

@rjzamora rjzamora added the bug Something is broken label Apr 3, 2024
@rjzamora rjzamora self-assigned this Apr 3, 2024
@phofl
Copy link
Collaborator

phofl commented Apr 3, 2024

Is this only on 3.11.9?

@rjzamora
Copy link
Member Author

rjzamora commented Apr 3, 2024

Is this only on 3.11.9?

Seems to be the case. We have had CI failures in RAPIDS since 3.11.9 was released yesterday. Using an earlier version of CPython works fine.

@phofl
Copy link
Collaborator

phofl commented Apr 3, 2024

That's interesting and weird that this is included in a bugfix release...

Copy link
Contributor

github-actions bot commented Apr 3, 2024

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

     15 files  ±0       15 suites  ±0   3h 21m 44s ⏱️ + 2m 43s
 13 107 tests ±0   12 173 ✅ ±0     930 💤 ±0  4 ❌ ±0 
162 263 runs  ±0  142 118 ✅ +1  20 141 💤  - 1  4 ❌ ±0 

For more details on these failures, see this check.

Results for commit b6b01ad. ± Comparison against base commit 9246df0.

♻️ This comment has been updated with latest results.

@wence-
Copy link
Contributor

wence- commented Apr 3, 2024

I think this was a bad interaction between a fix to correctly traverse __wrapped__ objects in inspect and the implementation of _bind_property in dask.dataframe.accessor, which provides a function object whose __wrapped__ attribute is a property.

Perhaps this is a more principled fix?

diff --git a/dask/dataframe/accessor.py b/dask/dataframe/accessor.py
index 9a73eb6c..0bb10e98 100644
--- a/dask/dataframe/accessor.py
+++ b/dask/dataframe/accessor.py
@@ -1,5 +1,6 @@
 from __future__ import annotations
 
+import functools
 import warnings
 
 import numpy as np
@@ -28,8 +29,15 @@ def _bind_property(cls, pd_cls, attr, min_version=None):
 
     func.__name__ = attr
     func.__qualname__ = f"{cls.__name__}.{attr}"
+    original_prop = getattr(pd_cls, attr)
+    if isinstance(original_prop, property):
+        method = original_prop.fget
+    elif isinstance(original_prop, functools.cached_property):
+        method = original_prop.func
+    else:
+        raise TypeError("bind_property expects original class to provide a property")
     try:
-        func.__wrapped__ = getattr(pd_cls, attr)
+        func.__wrapped__ = method
     except Exception:
         pass
     setattr(cls, attr, property(derived_from(pd_cls, version=min_version)(func)))

@wence-
Copy link
Contributor

wence- commented Apr 3, 2024

Relevant cpython change: python/cpython#116197 (backing this out removes the error)

@rjzamora
Copy link
Member Author

rjzamora commented Apr 3, 2024

Perhaps this is a more principled fix?

Okay, that fix seems intuitive to me and works locally. Do you have thoughts @phofl ?

Copy link
Collaborator

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@phofl phofl merged commit 275a79c into dask:main Apr 3, 2024
26 of 28 checks passed
@rjzamora rjzamora deleted the fix-python-3.11.9-import branch April 3, 2024 17:22
@jrbourbeau jrbourbeau mentioned this pull request Apr 3, 2024
4 tasks
rapids-bot bot pushed a commit to rapidsai/rapids-dask-dependency that referenced this pull request Apr 4, 2024
This PR makes a number of significant changes to the patching infrastructure:
1. This PR reorganizes the patching logic to be based on a more principled approach. Rather than maintaining lists of patch functions that are each responsible for filtering modules to apply themselves to, patches are organized in the patches directory in a tree structure matching dask itself. Patches are found and run by importing the same relative paths within the `patches` directory corresponding to a particular dask or distributed module.
2. It adds proper support for patching submodules. Previously the loader was being disabled whenever a real dask module was being imported, but this is problematic because if some dask modules import others they will pre-populate `sys.modules` with the real modules and therefore the loader will never be used for loading a patched version of the submodule.
3. Patches are no longer just functions applied to modules, they are arbitrary functions executed when a module is imported. As a result, a wider range of modifications is possible than was previously allowed. In particular:
4. The more general functions allow the entire module import to be hijacked and redirected to other modules.
5. The new framework is used to vendor a patched version of the accessor.py module in dask, which resolves the issues observed in dask/dask#11035.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Richard (Rick) Zamora (https://github.com/rjzamora)

URL: #39
vyasr added a commit to vyasr/rapids-dask-dependency that referenced this pull request Apr 4, 2024
This PR makes a number of significant changes to the patching infrastructure:
1. This PR reorganizes the patching logic to be based on a more principled approach. Rather than maintaining lists of patch functions that are each responsible for filtering modules to apply themselves to, patches are organized in the patches directory in a tree structure matching dask itself. Patches are found and run by importing the same relative paths within the `patches` directory corresponding to a particular dask or distributed module.
2. It adds proper support for patching submodules. Previously the loader was being disabled whenever a real dask module was being imported, but this is problematic because if some dask modules import others they will pre-populate `sys.modules` with the real modules and therefore the loader will never be used for loading a patched version of the submodule.
3. Patches are no longer just functions applied to modules, they are arbitrary functions executed when a module is imported. As a result, a wider range of modifications is possible than was previously allowed. In particular:
4. The more general functions allow the entire module import to be hijacked and redirected to other modules.
5. The new framework is used to vendor a patched version of the accessor.py module in dask, which resolves the issues observed in dask/dask#11035.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Richard (Rick) Zamora (https://github.com/rjzamora)

URL: rapidsai#39
@meksor meksor mentioned this pull request Apr 9, 2024
wence- added a commit to wence-/dask that referenced this pull request Apr 12, 2024
The change in dask#11035 to adapt _bind_property to Python 3.11.9 moved
attribute access on the pandas class object around and therefore
dropped exception handling. While it is not the case that we currently
wrap attributes that don't exist, let's avoid an accidental breakage
in the future by reinstating the old exception handling behaviour.
jtilly pushed a commit to jtilly/dask that referenced this pull request Apr 25, 2024
GenevieveBuckley pushed a commit to dask/dask-image that referenced this pull request May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants