WIP: Aggregation methods for object-type array #3137

fujiisoup · 2018-02-04T08:58:13Z

Tests added / passed
Passes flake8 dask
Fully documented, including docs/source/changelog.rst for all changes
and one of the docs/source/*-api.rst files for new API

This fixes #3133.
As the current assert_eq looks unable to take care

nan-included arrays
scalar
I added assert_nan_eq function in test_reductions.py, but there should be better ways to do this.

I appreciate any suggestions.

mrocklin · 2018-02-04T14:11:45Z

dask/array/tests/test_reductions.py

+    assert np.isnan(b[nanidx].astype(float)).all()
+    a[nanidx] = 0.0
+    b[nanidx] = 0.0
+    assert_eq(a, b)


If possible I'm somewhat in favor of including nan handling logic directly into assert_eq. We've tried to centralize all testing logic into that one function in the past for the sake of simplicity. I can't think of a case where I would prefer to have non-nan-handling behavior. We could make this a keyword argument if desired.

This isn't needed, adding equal_nan=True to assert_eq already works.

jcrist

Besides some comments on how the tests are implemented, this looks fine to me.

I see you have some additional commits dragged in (I assume from a merge with master?) If you can, it'd be nice to rebase on master to see if you can remove those. No worries if this proves too difficult.

jcrist · 2018-02-05T18:30:53Z

dask/array/tests/test_reductions.py

+    assert np.isnan(b[nanidx].astype(float)).all()
+    a[nanidx] = 0.0
+    b[nanidx] = 0.0
+    assert_eq(a, b)


This isn't needed, adding equal_nan=True to assert_eq already works.

jcrist · 2018-02-05T18:34:34Z

dask/array/utils.py

        if _not_empty(a):
            assert a.dtype == a_original.dtype
        if check_shape:
            assert_eq_shape(a_original.shape, a.shape, check_nan=False)
    else:
        adt = getattr(a, 'dtype', None)
+        if not hasattr(a, 'dtype'):
+            a = np.array(a)


The only time that things should pass through without a dtype attribute (which is on all numpy scalars) is if it's an object dtype. At that, I'd switch the logic to:

if not hasattr(a, 'dtype'): a = np.array(a, dtype='O')

You'll also want to move this above the adt = getattr(a, 'dtype', None), as we'll probably want assert_eq(np.array(1, dtype='O'), 1) to pass.

jcrist · 2018-02-05T18:35:41Z

dask/array/utils.py


-    if str(adt) != str(bdt):
+    if check_dtype and str(adt) != str(bdt):


I'd remove this flag entirely, in favor of better handling of objects in assert_eq. If a result doesn't have a dtype attribute, it's implicitly an object dtype. We should always be checking the dtype of the results.

jcrist · 2018-02-05T18:50:25Z

dask/array/utils.py

        return np.allclose(a, b, **kwargs)
+    except TypeError:  # explicitly cast if implicit cast is not possible
+        return np.allclose(a.astype(float), b.astype(float), **kwargs)


Instead of handling things this way, I'd leave the allclose backported implementation as is (equal_nan was added in numpy 1.10), and define a new function that adds additional support for allclose on object dtypes. Something like:

def myallclose(a, b, equal_nan=False, **kwargs): if a.dtype != 'O': return allclose(a, b, equal_nan=equal_nan, **kwargs) if equal_nan: return (a.shape == b.shape and all(np.isnan(b) if np.isnan(a) else a == b for (a, b) in zip(a.flat, b.flat))) return (a == b).all()

a benefit of this is it works on dtypes that can't be cast to floats. Since our results are usually small, this implementation should be fine.

Could even redefine/reimport the existing allclose as _allclose, then define your new function as allclose.

Doing it this way is nice because it more cleanly demarcates what functionality is being backported to older versions of numpy (adding equal_nan support for numpy < 1.10) and what we're adding for our own tests (support for object dtypes).

jcrist · 2018-02-05T18:51:08Z

dask/array/utils.py

        pass

    c = a == b
+    print(a)
+    print(b)


Leftover from debugging?

- Add support for masked arrays as chunks - Add api docs for mask arrays - Update sparse docs on arbitrary chunk types

jcrist · 2018-02-06T19:17:41Z

Looks good to me. Thanks @fujiisoup! Merging.

mrocklin reviewed Feb 4, 2018

View reviewed changes

jcrist reviewed Feb 5, 2018

View reviewed changes

jcrist and others added 9 commits February 6, 2018 10:10

Masked arrays (dask#2301)

370972a

- Add support for masked arrays as chunks - Add api docs for mask arrays - Update sparse docs on arbitrary chunk types

aggregation ops for object-type array

e24884e

added assert_nan_eq

a9fa957

Assert shape / type

9a304b3

Added the object-type support for nansum, nanmin, and nanmax

bcbd1e1

Recover changes from previous commits.

deb4709

changelog.

12420d3

Support object-dtype in assert_eq.

e814b7f

Bug fix in assert_eq

4740b37

fujiisoup force-pushed the nansum_object branch from de28d00 to 4740b37 Compare February 6, 2018 01:13

fujiisoup added 2 commits February 6, 2018 10:33

Improve assert_eq by @jcrist's comments.

3fab62f

Remove an unintended duplicate.

7cb1582

jcrist merged commit 83c3129 into dask:master Feb 6, 2018

shoyer mentioned this pull request Feb 12, 2018

Support nan-ops for object-typed arrays pydata/xarray#1883

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Aggregation methods for object-type array #3137

WIP: Aggregation methods for object-type array #3137

fujiisoup commented Feb 4, 2018

mrocklin Feb 4, 2018

jcrist Feb 5, 2018

jcrist left a comment

jcrist Feb 5, 2018

jcrist Feb 5, 2018

jcrist Feb 5, 2018

jcrist Feb 5, 2018

jcrist Feb 5, 2018

jcrist commented Feb 6, 2018


		if str(adt) != str(bdt):
		if check_dtype and str(adt) != str(bdt):

WIP: Aggregation methods for object-type array #3137

WIP: Aggregation methods for object-type array #3137

Conversation

fujiisoup commented Feb 4, 2018

mrocklin Feb 4, 2018

Choose a reason for hiding this comment

jcrist Feb 5, 2018

Choose a reason for hiding this comment

jcrist left a comment

Choose a reason for hiding this comment

jcrist Feb 5, 2018

Choose a reason for hiding this comment

jcrist Feb 5, 2018

Choose a reason for hiding this comment

jcrist Feb 5, 2018

Choose a reason for hiding this comment

jcrist Feb 5, 2018

Choose a reason for hiding this comment

jcrist Feb 5, 2018

Choose a reason for hiding this comment

jcrist commented Feb 6, 2018