Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation of DataArray does not warn that inferring dimension names is deprecated #3820

Closed
sjvrijn opened this issue Mar 2, 2020 · 8 comments · Fixed by #3821
Closed

Comments

@sjvrijn
Copy link
Contributor

sjvrijn commented Mar 2, 2020

The documentation states:

coords...
...
Additionally, it is possible to define a coord whose name does not match the dimension name, or a coord based on multiple dimensions, with one of the following notations:

  • mapping {coord name: DataArray}
  • mapping {coord name: Variable}
  • mapping {coord name: (dimension name, array-like)}
  • mapping {coord name: (tuple of dimension names, array-like)}

dims (hashable or sequence of hashable, optional) – Name(s) of the data dimension(s). Must be either a hashable (only for 1D data) or a sequence of hashables with length equal to the number of dimensions. If this argument is omitted, dimension names are taken from coords (if possible) and otherwise default to ['dim_0', ... 'dim_n'].

Which seems to be no longer the case.

MCVE Code Sample

da = xr.DataArray(np.zeros((2, 2)), coords={'x': [1, 2], 'y': [1, 2]})                                                    
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-22-0d63eed9a72f> in <module>
----> 1 da = xr.DataArray(np.zeros((2, 2)), coords={'x': [1, 2], 'y': [1, 2]})

/scratch/local/lib/python3.8/site-packages/xarray/core/dataarray.py in __init__(self, data, coords, dims, name, attrs, encoding, indexes, fastpath)
    366             data = _check_data_shape(data, coords, dims)
    367             data = as_compatible_data(data)
--> 368             coords, dims = _infer_coords_and_dims(data.shape, coords, dims)
    369             variable = Variable(dims, data, attrs, encoding, fastpath=True)
    370 

/scratch/local/lib/python3.8/site-packages/xarray/core/dataarray.py in _infer_coords_and_dims(shape, coords, dims)
    105             if utils.is_dict_like(coords):
    106                 # deprecated in GH993, removed in GH1539
--> 107                 raise ValueError(
    108                     "inferring DataArray dimensions from "
    109                     "dictionary like ``coords`` is no longer "

ValueError: inferring DataArray dimensions from dictionary like ``coords`` is no longer supported. Use an explicit list of ``dims`` instead.

Expected Output

An update of the documentation to correctly specify the current behavior. (I'll propose a PR later today)

Problem Description

A mismatch between API specification and it's behavior seems like a problem to me :)

Output of xr.show_versions()

`xr.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.8.1 (default, Feb 11 2020, 12:54:25)
[GCC 5.4.0 20160609]
python-bits: 64
OS: Linux
OS-release: 4.15.0-88-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_IE.UTF-8
LOCALE: en_IE.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.3

xarray: 0.14.1
pandas: 0.25.3
numpy: 1.18.0
scipy: 1.4.1
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.1.2
cartopy: None
seaborn: None
numbagg: None
setuptools: 45.2.0
pip: 20.0.2
conda: None
pytest: None
IPython: 7.11.1
sphinx: 2.3.1

@max-sixty
Copy link
Collaborator

Agree! Thanks for spotting @sjvrijn . PR is appreciated, thanks for the offer!

@max-sixty
Copy link
Collaborator

For the PR: in general we don't need to document deprecated behavior, so you can remove than description rather than caveat it

@sjvrijn
Copy link
Contributor Author

sjvrijn commented Mar 2, 2020

@max-sixty Thanks for the tip. In the end it meant just changing the last line on dims. The paragraph on the coords argument is still valid after all.

On a related note: according to #727 (PR #993), this was deprecated since key-order in dictionaries was arbitrary at the time of that issue. However, their order is fixed since Python3.7, as noted in the documentation:

Changed in version 3.7: Dictionary order is guaranteed to be insertion order. This behavior was an implementation detail of CPython from 3.6.

I guess it's still too soon to 'un-deprecate' this behavior again? 👼

@max-sixty
Copy link
Collaborator

I guess it's still too soon to 'un-deprecate' this behavior again?

Ha! We do support 3.6. I think that almost everyone is using xarray with CPython, so it's almost certainly safe. But maybe not worth the tradeoff given the benefit is marginal, what are your thoughts?

max-sixty pushed a commit that referenced this issue Mar 3, 2020
#3821)

* removed mention that 'dims' are inferred from 'coords'-dict when omitted in DataArray (fixes #3820)

* added summary of PR #3821 to whats-new
@sjvrijn
Copy link
Contributor Author

sjvrijn commented Mar 3, 2020

I think that inferring dimension-names from the coords-dict is the most intuitive way to define a DataArray.

Passing a dictionary for coords is in my opinion the clearest way to indicate which coordinates belong to which dimension, so then why do I have to specify the same dimension names again?

An example of how I create them from my current project:

values = xr.DataArray(
    values,
    coords={'n_high': n_highs,
            'n_low': n_lows,
            'rep': repetitions,
            'model': models,
            'idx': range(n_test_samples),},
    dims=['n_high', 'n_low', 'rep', 'model', 'idx'],  <-- repeated dim names
    attrs=attributes,
)

If you expect almost everyone to use CPython or 3.7+ anyway, then I don't actually see any drawbacks, while it would regularly make code shorter and less repetitive.

@max-sixty
Copy link
Collaborator

I definitely agree it's better if possible. I think the tradeoff is between better for almost everyone vs the potential of a break for a small minority.

Currently we support python versions for 42 months, and Python 3.6 was released December 2016. So that means we support it until July 2020 -- given that timeframe I think we wait until then and add this back?

@sjvrijn
Copy link
Contributor Author

sjvrijn commented Mar 3, 2020

Waiting a few more months until it will definitely not be a problem for anyone seems fair to me 👍

@seth-p
Copy link
Contributor

seth-p commented Mar 3, 2020

Note that inferring dimensions from coords when it is a list of tuples does still work (with no deprecation warning):

In [1]: import numpy as np, xarray as xr                                                                                                                                                                                                                                     

In [2]: xr.DataArray(np.zeros((2, 2)), coords=[('x', [1, 2]), ('y', [1, 2])])                                                                                                                                                                                                
Out[2]: 
<xarray.DataArray (x: 2, y: 2)>
array([[0., 0.],
       [0., 0.]])
Coordinates:
  * x        (x) int64 1 2
  * y        (y) int64 1 2

max-sixty pushed a commit to max-sixty/xarray that referenced this issue Mar 4, 2020
pydata#3821)

* removed mention that 'dims' are inferred from 'coords'-dict when omitted in DataArray (fixes pydata#3820)

* added summary of PR pydata#3821 to whats-new
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants