-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
argmax()
causes dask to compute
#3237
Comments
Yes, this is definitely a bug -- thanks for clear example to reproduce it! These helper functions were originally added back in #1883 to handle object dtype arrays properly. So it would be nice to fix this for object arrays in dask, but for the much more common non-object dtype arrays we should really just be using |
Those little changes do solve the MCVE, but break at least one test. I don't have enough of an understanding of the (nan)ops logic in xarray to get around the issue. But may be this helps: The change
The failing test...
___________ TestVariable.test_reduce ________________
...
def f(values, axis=None, skipna=None, **kwargs):
if kwargs.pop("out", None) is not None:
raise TypeError("`out` is not valid for {}".format(name))
values = asarray(values)
if coerce_strings and values.dtype.kind in "SU":
values = values.astype(object)
func = None
if skipna or (skipna is None and values.dtype.kind in "cfO"):
nanname = "nan" + name
func = getattr(nanops, nanname)
else:
func = _dask_or_eager_func(name)
try:
return func(values, axis=axis, **kwargs)
except AttributeError:
if isinstance(values, dask_array_type):
try: # dask/dask#3133 dask sometimes needs dtype argument
# if func does not accept dtype, then raises TypeError
return func(values, axis=axis, dtype=values.dtype, **kwargs)
except (AttributeError, TypeError):
msg = "%s is not yet implemented on dask arrays" % name
else:
msg = (
"%s is not available with skipna=False with the "
"installed version of numpy; upgrade to numpy 1.12 "
"or newer to use skipna=True or skipna=None" % name
)
> raise NotImplementedError(msg)
E NotImplementedError: argmax is not available with skipna=False with the installed version of numpy; upgrade to numpy 1.12 or newer to use skipna=True or skipna=None
... Note: I habe numpy 1.17 instaleed so the error msg here seems missleading. |
Thanks for sharing the patch! I dropped into a debugger by adding
So it looks like |
* Make argmin/max work lazy with dask (#3237). * dask: Testing number of computes on reduce methods. * what's new updated * Fix typo Co-Authored-By: Stephan Hoyer <[email protected]> * Be more explicit. Co-Authored-By: Stephan Hoyer <[email protected]> * More explicit raise_if_dask_computes * nanargmin/max: only set fill_value when needed
Problem Description
While digging for #2511 I found that
da.argmax()
causes compute on a dask array innanargmax(a, axis=None)
:xarray/xarray/core/nanops.py
Line 120 in 131f602
I feel like this shouldn't be the case as
da.max()
andda.data.argmax()
don't compute and it renders the laziness useless.MCVE Code Sample
Expected Output
None of the methods should actually compute:
Output of
xr.show_versions()
xarray: 0.12.3+63.g131f6022
pandas: 0.25.0
numpy: 1.17.0
scipy: 1.3.1
netCDF4: 1.5.1.2
pydap: None
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.0.25
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.1.0
distributed: 1.27.1
matplotlib: 3.1.1
cartopy: 0.17.0
seaborn: 0.9.0
numbagg: None
setuptools: 41.0.1
pip: 19.0.3
conda: None
pytest: 5.0.1
IPython: 7.6.1
sphinx: 2.2.0
The text was updated successfully, but these errors were encountered: