Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have "unstack" return a boolean mask? #3518

Closed
Hoeze opened this issue Nov 13, 2019 · 1 comment · Fixed by #3541 or #3542
Closed

Have "unstack" return a boolean mask? #3518

Hoeze opened this issue Nov 13, 2019 · 1 comment · Fixed by #3541 or #3542

Comments

@Hoeze
Copy link

Hoeze commented Nov 13, 2019

MCVE Code Sample

arr = xr.DataArray(np.arange(6).reshape(2, 3),
                  coords=[('x', ['a', 'b']), ('y', [0, 1, 2])])
arr
stacked = arr.stack(z=('x', 'y'))
stacked[:4].unstack().dtype

Expected Output

>>> arr = xr.DataArray(np.arange(6).reshape(2, 3),
...                  coords=[('x', ['a', 'b']), ('y', [0, 1, 2])])
>>> arr
<xarray.DataArray (x: 2, y: 3)>
array([[0, 1, 2],
       [3, 4, 5]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int64 0 1 2
>>> stacked = arr.stack(z=('x', 'y'))
>>> stacked[:4].unstack().dtype
dtype('float64')

Problem Description

Unstacking changes the data type to float for NaN's.
Are there thoughts on alternative options, e.g. fill_value=0 or return_boolean_mask, in order to retain the original data type?

Currently, I obtain a boolean missing array by checking for isnan.
Then I call fillnan(0) and convert the data type back to integer.
However, this is quite inefficient.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-957.10.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1

xarray: 0.14.0
pandas: 0.25.1
numpy: 1.17.2
scipy: 1.3.1
netCDF4: 1.4.2
pydap: None
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: None
zarr: 2.3.2
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.5.2
distributed: 2.5.2
matplotlib: 3.1.1
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 41.4.0
pip: 19.2.3
conda: None
pytest: 5.0.1
IPython: 7.8.0
sphinx: None

@shoyer
Copy link
Member

shoyer commented Nov 13, 2019

We should definitely have a fill_value option here, and ideally a sparse option, too.

Conceptually unstack is very similar to from_dataframe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants