-
-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support saving as netcdf InferenceData
that has MultiIndex
coordinates
#2165
Comments
I'm not sure if we want to deviate from netcdf4 spec. We could have functionality to transform from and to multiindex with suitable info in attrs. But it wouldn't be then part of official spec. |
I understand @ahartikainen, but I think that there are scenarios where it is very helpful to support |
It could also be useful for situations in which posterior samples are streamed into the
and you wouldn't have to wait for all chains to finish to collect the posterior samples into a uniform grid |
I just ran into the same issue, and I agree this would be great to add this to ArviZ! Working with Multi-index is still quite hard in the xarray world, so anything that can make it accessible seems worth it to me. |
I packaged that code in cf-xarray if you want to use that. That version fixes a couple of bugs |
The problem is how these things are supposed to work in other languages. In my own opinion multiindex items can saved as coords against a stacking dimensions. |
I think it would be best to use the implementation in cf-xarray. Is it critical for this to happen automatically when calling Some options that come to mind are:
|
I think this would be a great PR to xarray! |
Tell us about it
There are many situations in which it is very convenient to use
pandas.MultiIndex
as coordinates of anxarray.DataArray
. The problem is that, at the moment,xarray
doesn't provide a builtin way to save these indexes in netcdf format. Take for example:This raises a
NotImplementedError
with the following tracebackThoughts on implementation
I had a look at the mentioned xarray issue, and the approach suggested by @dcherian works (at least in the scenario that I had to work with a month ago). I think that it would be good to incorporate something like that into
arviz.from_netcdf
andInferenceData.to_netcdf
. The basic idea is to convert theMultiIndex
into a simple array of integers, that are the codes of theMultiIndex
, and also add an attribute that states that the dimension/coordinates were originally aMultiIndex
. This attribute is also used to keep track of the level values and names of the originalMultiIndex
. The modified datastructure can be serialized tonetcdf
without any problems. The only thing to be aware of is that when thenetcdf
is loaded, some work has to happen to rebuild theMultiIndex
from the original coordinates. I think that this small overhead is worth the benefit of bringingMultiIndex
support to arviz.If you all agree that this would be valuable, I can write a PR.
The text was updated successfully, but these errors were encountered: