You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I know xarray doesn't support netCDF4 Group functionality. That's fine, I bet it's incredibly thorny. My issue is, when you open the root group of a netCDF4 file which contains groups, xarray doesn't even tell you that there are groups; they are totally invisible. This seems like a big flaw; you've opened a file, shouldn't you at least be told what's in it?
Solution
When you open a dataset with the netcdf4-python library, you get something like this:
>>> netCDF4.Dataset(path)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5): some global attribute: some value
dimensions(sizes): ...
variables(dimensions): ... groups: group1, group2
"groups" shows up sort of like an auto-generated attribute. Surely xarray can do something similar:
The workaround I am considering is to actually add an attribute to my root group which contains a list of the groups in the file, so people using xarray will see that there are more groups in the file. However, this is redundant considering the information is already in the netCDF file, and also brittle since there's no guarantee the attribute truly reflects the groups in the file.
Conclusion
Considering that xr.open_dataset has a group parameter to open groups, it seems unfortunate that when you open a file, you don't see what groups are in there. Instead, you have to use an external tool to get information on the file's groups, then open them with xarray. Since this is only a matter of extracting group data and printing it, surely this is a simple (and imo, valuable) addition. I'd be happy to implement it and submit a PR if people are on-board. I might need some direction though, this is my first time digging into the xarray source code, and I don't see a __str__ method on the Dataset class, which is where I expected to make this addition.
The text was updated successfully, but these errors were encountered:
Update: after diving into the way the source code works, it seems group information would actually have to get loaded on the backend loaders; this is a pretty deep code change. The minimal diff seems like it would be to load the group names, then add to the global attrs dictionary {"groups": "group1, group2, ..."}. This way, they would automagically propagate all the way through the codebase to the __repr__ call and show up in the output string. Of course, it's a little clugey, because the names of the groups aren't really an attribute of the underlying file. And if there's already an attribute named 'groups'? Tricky, not sure what the optimal resolution to that is; probably just don't overwrite it and do nothing. But the alternative is creating a representation for "groups" alongside "dimensions", "coordinates", "data variables", and "attributes", and adding machinery for these throughout the code base, changing method signatures, etc, which is really more moving in the direction of Datasets actually supporting groups, which is a whole different undertaking. This is just supposed to be a bit more visibility into the underlying netcdf file. Unsure if this moderate level of cluge is acceptable or not though.
Problem
I know xarray doesn't support netCDF4 Group functionality. That's fine, I bet it's incredibly thorny. My issue is, when you open the root group of a netCDF4 file which contains groups, xarray doesn't even tell you that there are groups; they are totally invisible. This seems like a big flaw; you've opened a file, shouldn't you at least be told what's in it?
Solution
When you open a dataset with the netcdf4-python library, you get something like this:
"groups" shows up sort of like an auto-generated attribute. Surely xarray can do something similar:
Workaround
The workaround I am considering is to actually add an attribute to my root group which contains a list of the groups in the file, so people using xarray will see that there are more groups in the file. However, this is redundant considering the information is already in the netCDF file, and also brittle since there's no guarantee the attribute truly reflects the groups in the file.
Conclusion
Considering that
xr.open_dataset
has agroup
parameter to open groups, it seems unfortunate that when you open a file, you don't see what groups are in there. Instead, you have to use an external tool to get information on the file's groups, then open them with xarray. Since this is only a matter of extracting group data and printing it, surely this is a simple (and imo, valuable) addition. I'd be happy to implement it and submit a PR if people are on-board. I might need some direction though, this is my first time digging into the xarray source code, and I don't see a__str__
method on the Dataset class, which is where I expected to make this addition.The text was updated successfully, but these errors were encountered: