Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot set dataarray accessor attributes if DataArray part of Dataset #5863

Closed
shunt16 opened this issue Oct 14, 2021 · 4 comments
Closed

Cannot set dataarray accessor attributes if DataArray part of Dataset #5863

shunt16 opened this issue Oct 14, 2021 · 4 comments

Comments

@shunt16
Copy link

shunt16 commented Oct 14, 2021

Hi there, I'm trying to add some domain-specific functionality to DataArray's using register_dataarray_accessor decorator. I've run across what I think is a bug when this is applied to DataArrays that are part of a Dataset - I can't seem to set attributes. This works fine when the DataArray is not part of a Dataset. See example below:

import xarray as xr

@xr.register_dataarray_accessor("test")
class TestAccessor(object):
    def __init__(self, xarray_obj):
        self._obj = xarray_obj
        self.my_attr = "hello"

    def get_my_attr(self):
        return self.my_attr

    def set_my_attr(self, value):
    	self.my_attr = value
            

da = xr.DataArray()
ds = xr.Dataset()
ds["da"] = da

print(ds.da.test.my_attr)

ds.da.test.set_my_attr("bye")

print(ds.da.test.my_attr)

Gives

"hello"
"hello"

If the data array is not part of a dataset it works fine, so:

import xarray as xr

@xr.register_dataarray_accessor("test")
class TestAccessor(object):
    def __init__(self, xarray_obj):
        self._obj = xarray_obj
        self.my_attr = "hello"

    def get_my_attr(self):
        return self.my_attr

    def set_my_attr(self, value):
    	self.my_attr = value
            

da = xr.DataArray()

print(da.test.my_attr)

da.test.set_my_attr("bye")

print(da.test.my_attr)

Gives

"hello"
"bye"

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.9.7 (default, Sep 16 2021, 08:50:36)
[Clang 10.0.0 ]
python-bits: 64
OS: Darwin
OS-release: 20.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('en_GB', 'UTF-8')
libhdf5: 1.12.0
libnetcdf: 4.7.4
xarray: 0.19.0
pandas: 1.3.3
numpy: 1.20.3
scipy: None
netCDF4: 1.5.7
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.5.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 58.0.4
pip: 21.2.4
conda: None
pytest: None
IPython: None
sphinx: None

@TomNicholas
Copy link
Member

Hi @shunt16 - unfortunately your design pattern with an accessor storing attributes isn't going to work, because currently accessors cannot be relied upon to store state.

You're always going to lose my_attr when you try to store your DataArray in a Dataset, because a Dataset is more complex under the hood than a simple container of DataArrays, meaning the return DataArray is a different object from what you put in. Therefore calling .test will create a new accessor instance, which doesn't have my_attr set. (You're still seeing the "hello" because that's assigned at accessor creation time, which happens every time you call .test on a new object.)

Can I ask what problem it is that you're trying to solve by storing your custom attribute on the accessor? Perhaps it could be solved another way without relying on #3268.

@shunt16
Copy link
Author

shunt16 commented Oct 15, 2021

Hi @TomNicholas - thanks for your reply.

Effectively, I have Datasets where some of the variables are ancillary to other variables. I want my accessor to have functionality to with the ancillary variables as input. I was trying to store a list of the ancillary variable names for each variable in the state of the accessor.

Precisely, some of my variables are measured data, and others uncertainties of the measured data. Each measured variable may have multiple components of uncertainty. I need my variable accessor to know which uncertainties are relevant for its computations. So I can do things like this:

ds.measurement.unc.total

which could sum all the components of uncertainty to evaluate the total. I can see an issue around caching results with this type of computation when state is not preserved.

I guess the solution would be to keep this kind of state info in variable attributes? And avoid dataarray_accessor when using datasets?

@TomNicholas
Copy link
Member

TomNicholas commented Oct 15, 2021

(I've just noticed that the original complaint in this issue is a duplicate of #3205)

Precisely, some of my variables are measured data, and others uncertainties of the measured data. Each measured variable may have multiple components of uncertainty. I need my variable accessor to know which uncertainties are relevant for its computations.

That is an interesting, but somewhat tricky, use case. At the moment the best way to solve it is probably to use .attrs to store the names of the relevant uncertainty variables, and have your accessor consult those names each time it is called (the same suggestion that was given in #3268). Using .attrs isn't yet perfect but should work for lots of cases where the accessor state would fail you.

In the longer term this might be another use case for a DataTree, where you store all the relevant uncertainty variables under a particular node, something like this

DataTree("root")
|-- "temperature"
|   |-- "measurement"
|   |-- "total_uncertainty"
|   |-- "uncertainty_component_1"
|-- "population"
...

@shunt16
Copy link
Author

shunt16 commented Oct 19, 2021

Okay thanks again @TomNicholas, I will go-ahead and use variable attrs for now and keep an eye on the DataTree development - looks interesting.

A general comment would be, the accessor design pattern seems really nice - but, dataset variable accessors not being stateful is a shame and leads to confusion. If there's no good solution (variable accessors?), I think it would make sense to at least add a disclaimer in the docs (unless there is and I've missed it).

@shunt16 shunt16 closed this as completed Oct 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants