-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Aggregation methods for object-type array #3137
Conversation
dask/array/tests/test_reductions.py
Outdated
assert np.isnan(b[nanidx].astype(float)).all() | ||
a[nanidx] = 0.0 | ||
b[nanidx] = 0.0 | ||
assert_eq(a, b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible I'm somewhat in favor of including nan
handling logic directly into assert_eq
. We've tried to centralize all testing logic into that one function in the past for the sake of simplicity. I can't think of a case where I would prefer to have non-nan-handling behavior. We could make this a keyword argument if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't needed, adding equal_nan=True
to assert_eq
already works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides some comments on how the tests are implemented, this looks fine to me.
I see you have some additional commits dragged in (I assume from a merge with master?) If you can, it'd be nice to rebase on master to see if you can remove those. No worries if this proves too difficult.
dask/array/tests/test_reductions.py
Outdated
assert np.isnan(b[nanidx].astype(float)).all() | ||
a[nanidx] = 0.0 | ||
b[nanidx] = 0.0 | ||
assert_eq(a, b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't needed, adding equal_nan=True
to assert_eq
already works.
dask/array/utils.py
Outdated
if _not_empty(a): | ||
assert a.dtype == a_original.dtype | ||
if check_shape: | ||
assert_eq_shape(a_original.shape, a.shape, check_nan=False) | ||
else: | ||
adt = getattr(a, 'dtype', None) | ||
if not hasattr(a, 'dtype'): | ||
a = np.array(a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only time that things should pass through without a dtype attribute (which is on all numpy scalars) is if it's an object dtype. At that, I'd switch the logic to:
if not hasattr(a, 'dtype'):
a = np.array(a, dtype='O')
You'll also want to move this above the adt = getattr(a, 'dtype', None)
, as we'll probably want assert_eq(np.array(1, dtype='O'), 1)
to pass.
dask/array/utils.py
Outdated
|
||
if str(adt) != str(bdt): | ||
if check_dtype and str(adt) != str(bdt): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove this flag entirely, in favor of better handling of objects in assert_eq
. If a result doesn't have a dtype attribute, it's implicitly an object dtype. We should always be checking the dtype of the results.
dask/array/utils.py
Outdated
return np.allclose(a, b, **kwargs) | ||
except TypeError: # explicitly cast if implicit cast is not possible | ||
return np.allclose(a.astype(float), b.astype(float), **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of handling things this way, I'd leave the allclose
backported implementation as is (equal_nan
was added in numpy 1.10), and define a new function that adds additional support for allclose
on object dtypes. Something like:
def myallclose(a, b, equal_nan=False, **kwargs):
if a.dtype != 'O':
return allclose(a, b, equal_nan=equal_nan, **kwargs)
if equal_nan:
return (a.shape == b.shape and
all(np.isnan(b) if np.isnan(a) else a == b
for (a, b) in zip(a.flat, b.flat)))
return (a == b).all()
a benefit of this is it works on dtypes that can't be cast to floats. Since our results are usually small, this implementation should be fine.
Could even redefine/reimport the existing allclose
as _allclose
, then define your new function as allclose
.
Doing it this way is nice because it more cleanly demarcates what functionality is being backported to older versions of numpy (adding equal_nan
support for numpy < 1.10) and what we're adding for our own tests (support for object dtypes).
dask/array/utils.py
Outdated
pass | ||
|
||
c = a == b | ||
print(a) | ||
print(b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover from debugging?
- Add support for masked arrays as chunks - Add api docs for mask arrays - Update sparse docs on arbitrary chunk types
de28d00
to
4740b37
Compare
Looks good to me. Thanks @fujiisoup! Merging. |
flake8 dask
docs/source/changelog.rst
for all changesand one of the
docs/source/*-api.rst
files for new APIThis fixes #3133.
As the current
assert_eq
looks unable to take carenan
-included arraysI added
assert_nan_eq
function in test_reductions.py, but there should be better ways to do this.I appreciate any suggestions.