Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recent versions of sparse and dask seem to be incompatible with our tests #5654

Closed
github-actions bot opened this issue Jul 31, 2021 · 28 comments
Closed

Comments

@github-actions
Copy link
Contributor

github-actions bot commented Jul 31, 2021

Workflow Run URL

Python 3.9 Test Summary Info
FAILED xarray/tests/test_plot.py::TestFacetGrid::test_can_set_norm - ValueErr...
FAILED xarray/tests/test_plot.py::TestCFDatetimePlot::test_cfdatetime_line_plot
FAILED xarray/tests/test_plot.py::TestCFDatetimePlot::test_cfdatetime_pcolormesh_plot
FAILED xarray/tests/test_plot.py::TestCFDatetimePlot::test_cfdatetime_contour_plot
@github-actions github-actions bot added the CI Continuous Integration tools label Jul 31, 2021
@keewis
Copy link
Collaborator

keewis commented Jul 31, 2021

this seems to be caused by changes to either dask or sparse, not sure which.

@Illviljan
Copy link
Contributor

This is showing up in the normal CI now, see
https://github.com/pydata/xarray/pull/5667/checks?check_run_id=3329247566

@dcherian
Copy link
Contributor

dcherian commented Aug 14, 2021

This looks like dask.

__________________________________ test_chunk __________________________________

    @requires_dask
    def test_chunk():
        s = sparse.COO.from_numpy(np.array([0, 0, 1, 2]))
        a = DataArray(s)
        ac = a.chunk(2)
        assert ac.chunks == ((2, 2),)
        assert isinstance(ac.data._meta, sparse.COO)
>       assert_identical(ac, a)


self = <COO: shape=(2,), dtype=bool, nnz=0, fill_value=False>
other = <function _broadcast_trick_inner at 0x7f325aef91f0>

    def func(self, other):
        if _disables_array_ufunc(other):
            return NotImplemented
>       return ufunc(other, self)
E       TypeError: operand type(s) all returned NotImplemented from __array_ufunc__(<ufunc 'bitwise_and'>, '__call__', <function _broadcast_trick_inner at 0x7f325aef91f0>, <COO: shape=(2,), dtype=bool, nnz=0, fill_value=False>): 'curry', 'COO'

cc @jrbourbeau

@jrbourbeau
Copy link
Contributor

jrbourbeau commented Aug 16, 2021

Thanks for the ping @dcherian. Running against Dask's git history, dask/dask#7939 is the place where xarray/tests/test_sparse.py::test_chunk started failing. We probably dropped some relevant compatibility code when we stopped supporting NumPy 1.17 and pandas 0.25.

Since xarray has a clearly defined policy for minimum dependencies, would dropping NumPy 1.17 and pandas 0.25 be an acceptable solution here?

@keewis
Copy link
Collaborator

keewis commented Aug 16, 2021

We already require pandas>=1.0 and I guess we can bump numpy (NEP29 allows it for releases after Jul 26), but I'm not sure why that would be an issue for the upstream-dev CI. For reference, the installed versions arenumpy=1.22.0.dev0+716.g20807c743 and pandas=1.4.0.dev0+436.gfcf3f24e02.

Interestingly, the installation of dask from github started failing two days ago (see the last succeeding and the first failing install – unless I'm misinterpreting the logs?). Any idea how to fix that? The most recent run correctly installed dask so I guess this was a temporary issue (or dask resolved this really quickly)

@jrbourbeau
Copy link
Contributor

jrbourbeau commented Aug 20, 2021

Ah, that's a good point @keewis. Stepping back a bit, my current situation locally is I'm able to reproduce the xarray/tests/test_sparse.py::test_chunk failure with:

Rolling dask back to the 2021.7.2 release, but keeping everything else the same, test_chunk passes. After running git bisect between the latest dask commit and the 2021.7.2 release, dask/dask#7939 is the point where test_chunk started failing (cc @jakirkham for visibility). I'll try to dig in a bit more to see if I can find the root cause of the failure

EDIT: I should note that the test_chunk failure has a similar traceback to dask/dask#5259

@jakirkham
Copy link

jakirkham commented Aug 21, 2021

Would double check that CI is pulling the latest sparse

xref: dask/dask#7939 (comment)

@jrbourbeau
Copy link
Contributor

Good point. CI is currently pulling in the latest sparse=0.12.0 release (which is also what I'm running locally).

It also appears there were some ordering concerns related to dask/dask#5259 when test_chunk was added (xref #3202 (comment))

@keewis
Copy link
Collaborator

keewis commented Aug 21, 2021

FWIW, the upstream-dev CI installs sparse=0.12.0+55.ge706c2e (i.e. the most recent state of the repository). However, since our other CI is also failing I guess this does not change much.

@jakirkham
Copy link

Yeah was just mentioning that since we had older version of sparse pulled while developing that PR at one point and it caused issues. Sounds like that is not the case here

@keewis keewis changed the title ⚠️ Nightly upstream-dev CI failed ⚠️ recent versions of sparse and dask seem to be incompatible with our tests Aug 21, 2021
@jrbourbeau
Copy link
Contributor

Also cc @crusaderky if you get a chance to look into this

@crusaderky
Copy link
Contributor

I'm a bit lost.
The failure in xarray/tests/test_sparse.py::test_chunk doesn't appear anywhere in recent CI runs and I can't reproduce it locally.

The ongoing failures in upstream-dev:

FAILED xarray/tests/test_plot.py::TestFacetGrid::test_can_set_norm
FAILED xarray/tests/test_plot.py::TestCFDatetimePlot::test_cfdatetime_line_plot
FAILED xarray/tests/test_plot.py::TestCFDatetimePlot::test_cfdatetime_pcolormesh_plot
FAILED xarray/tests/test_plot.py::TestCFDatetimePlot::test_cfdatetime_contour_plot

Aren't in any way related to either sparse or dask; they appear when I upgrade matplotlib from 3.4.3 to 3.5.0b1.

@dcherian
Copy link
Contributor

The failure in xarray/tests/test_sparse.py::test_chunk doesn't appear anywhere in recent CI runs

It was xfailed in #5729 , maybe that explains it

@crusaderky
Copy link
Contributor

I see now. I got really confused by the matplotlib issue which is what the opening post of this ticket is about. Would it be possible to have the two issues tracked by separate tickets?

@keewis keewis removed the CI Continuous Integration tools label Aug 26, 2021
@keewis
Copy link
Collaborator

keewis commented Aug 26, 2021

Would it be possible to have the two issues tracked by separate tickets?

definitely, let's keep this one focused on sparse. I think removing the label should be enough to have CI open a new issue (not sure, though).

@crusaderky
Copy link
Contributor

Narrowed it down.

>>> import dask.array as da
>>> import sparse
>>> s = sparse.COO.from_numpy([0, 0, 1, 2])
>>> a = da.from_array(s)
>>> z = da.zeros_like(a)
>>> z
dask.array<zeros_like, shape=(4,), dtype=int64, chunksize=(4,), chunktype=sparse.COO>
>>> z.compute()
<function _broadcast_trick_inner at 0x7fb8a3874e50>

numpy-1.20.1
dask-2021.3.0
sparse-0.11.2

@crusaderky
Copy link
Contributor

crusaderky commented Aug 26, 2021

da.zeros_like(a)
internally invokes
da.zeros(a.shape, meta=a._meta)
which internally invokes
np.broadcast(np.zeros_like(a._meta, shape=1), a.shape)

The problem is that sparse.zeros_like does not accept the shape= optional parameter, which is new in numpy 1.17.

This is where it gets triggered:
https://github.com/dask/dask/blob/85f0b14bd36a5135ce51aeee067b6207374b00c4/dask/array/wrap.py#L128-L168

I don't think dask should write a workaround to this, and it should be just fixed upstream? CC @jrbourbeau for opinion.

@jrbourbeau
Copy link
Contributor

Hrm, looking at the API docs for sparse.zeros_like, it appears to support shape=

In [1]: import sparse

In [2]: sparse.__version__
Out[2]: '0.12.0'

In [3]: import numpy as np

In [4]: x = np.arange(6)

In [5]: y = sparse.zeros_like(x, shape=(3, 2))

In [6]: y
Out[6]: <COO: shape=(3, 2), dtype=int64, nnz=0, fill_value=0>

In [7]: y.todense()
Out[7]:
array([[0, 0],
       [0, 0],
       [0, 0]])

@crusaderky
Copy link
Contributor

crusaderky commented Aug 26, 2021

Ah, shape= was very recently added in 0.12.0. It wasn't there in 0.11.2.

[EDIT] It is not the only problem. Investigating further.

@crusaderky
Copy link
Contributor

crusaderky commented Aug 26, 2021

The second issue is that sparse.zeros_like doesn't accept the order= parameter, which is required by the same dask code linked above (it's in the kwargs in dask/wrap.py:133). This in turn triggers an unfortunate handling of TypeError on behalf of @curry, which obfuscates the exception.

@crusaderky
Copy link
Contributor

crusaderky commented Aug 26, 2021

Third and final issue is when numpy.broadcast_to is applied to the output of zeros_like:

>>> import sparse
>>> s = sparse.COO.from_numpy([0, 0, 1, 2])
>>> np.broadcast_to(np.zeros_like(s.todense(), shape=()), (3, ))
array([0, 0, 0])
>>> np.broadcast_to(np.zeros_like(s, shape=()), (3, ))
ValueError: The data length does not match the coordinates given.
len(data) = 0, but 3 coords specified.

@jakirkham
Copy link

cc @pentschev (just so you are aware)

@jrbourbeau
Copy link
Contributor

Thanks for hunting all this down @crusaderky, I'm currently grokking it. Also cc @hameerabbasi for visibility

@hameerabbasi
Copy link

Third and final issue is when numpy.broadcast_to is applied to the output of zeros_like:


>>> import sparse

>>> s = sparse.COO.from_numpy([0, 0, 1, 2])

>>> np.broadcast_to(np.zeros_like(s.todense(), shape=()), (3, ))

array([0, 0, 0])

>>> np.broadcast_to(np.zeros_like(s, shape=()), (3, ))

ValueError: The data length does not match the coordinates given.

len(data) = 0, but 3 coords specified.

This seems like a genuine bug, please file it on our tracker if possible.

@hameerabbasi
Copy link

I have fixed the reported issues and released version 0.13.0 with the fixes included.

@jakirkham
Copy link

Thanks Hameer! 😄

@hameerabbasi
Copy link

Thank you guys for the amazing investigation into the issues here. 🤗

I was basically pinpointed at the issues from my point of view.

@jrbourbeau
Copy link
Contributor

This should be closed via #5751. Again, thanks all for your efforts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants