-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make illegal path-like variable names when constructing a DataTree from a Dataset #9378
Make illegal path-like variable names when constructing a DataTree from a Dataset #9378
Conversation
cc @TomNicholas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @etienneschalk ! Just two small comments.
xarray/core/datatree.py
Outdated
raise KeyError( | ||
"Given Dataset contains variable names with '/': " | ||
f"{offending_variable_names}. " | ||
"A Dataset represents a group, and a single group " | ||
"cannot have path-like variable names with '/' characters in them. " | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good, but @keewis you mentioned you had a comment didn't you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the improvement to be made here is to clarify that currently slashes are allowed in variable names on xr.Dataset
, but are forbidden in variable names in xr.DataTree
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this has to be considered here, but NetCDF4 and HDF5 do not allow forward slashes in variable/dataset names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I think that doesn't change anything for this PR, so long as that restriction is already explicitly accounted for in the respective backend save methods.
…s-when-dt-from-ds
Co-authored-by: Tom Nicholas <[email protected]>
Co-authored-by: Tom Nicholas <[email protected]>
Co-authored-by: Tom Nicholas <[email protected]>
Integrated the suggested changes and updated the test message, ready to be merged a priori @TomNicholas |
…s-when-dt-from-ds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
It would also be nice to add a check like this for child names.
* main: (26 commits) Forbid modifying names of DataTree objects with parents (pydata#9494) DAS-2155 - Merge datatree documentation into main docs. (pydata#9033) Make illegal path-like variable names when constructing a DataTree from a Dataset (pydata#9378) Ensure TreeNode doesn't copy in-place (pydata#9482) `open_groups` for zarr backends (pydata#9469) Update pyproject.toml (pydata#9484) New whatsnew section (pydata#9483) Release notes for v2024.09.0 (pydata#9480) Fix `DataTree.coords.__setitem__` by adding `DataTreeCoordinates` class (pydata#9451) Rename DataTree's "ds" and "data" to "dataset" (pydata#9476) Update DataTree repr to indicate inheritance (pydata#9470) Bump pypa/gh-action-pypi-publish in the actions group (pydata#9460) Repo checker (pydata#9450) Add days_in_year and decimal_year to dt accessor (pydata#9105) remove parent argument from DataTree.__init__ (pydata#9465) Fix inheritance in DataTree.copy() (pydata#9457) Implement `DataTree.__delitem__` (pydata#9453) Add ASV for datatree.from_dict (pydata#9459) Make the first argument in DataTree.from_dict positional only (pydata#9446) Fix typos across the code, doc and comments (pydata#9443) ...
* main: Opt out of floor division for float dtype time encoding (pydata#9497) fixed formatting for whats-new (pydata#9493) Forbid modifying names of DataTree objects with parents (pydata#9494) DAS-2155 - Merge datatree documentation into main docs. (pydata#9033) Make illegal path-like variable names when constructing a DataTree from a Dataset (pydata#9378) Ensure TreeNode doesn't copy in-place (pydata#9482) `open_groups` for zarr backends (pydata#9469) Update pyproject.toml (pydata#9484) New whatsnew section (pydata#9483)
…om a Dataset (pydata#9378) * Make illegal path-like variable names when constructing a DataTree from a Dataset * Updated whats-new.rst * PR comments * Revert diff * Update xarray/core/datatree.py Co-authored-by: Tom Nicholas <[email protected]> * Update xarray/core/datatree.py Co-authored-by: Tom Nicholas <[email protected]> * Update xarray/tests/test_datatree.py Co-authored-by: Tom Nicholas <[email protected]> * Update expected Exception message in test * Merge changes from pydata#9476 * Fix --------- Co-authored-by: Tom Nicholas <[email protected]>
Closes Forbid path-like variable names in datatree #9339
Tests added
User visible changes (including notable bug fixes) are documented in
whats-new.rst
New functions/methods are listed inapi.rst
Original issue on the DataTree repository : When creating a DataTree from a Dataset with path-like variable, subgroups are expected to be created xarray-contrib/datatree#311
Original PR on the DataTree repository : Make illegal path-like variable names when constructing a DataTree from a Dataset xarray-contrib/datatree#314
Technical Note
Regarding
Hashable
vsstr
Dataset keysNote: DataTree keys are
Hashable
. I only check for slashes in the variable names if they are instance ofstr
.I never encountered a case (yet) where a Dataset keys are not
str
butHashable
in the broader case.We can imagine corner-cases where keys would be other types of
Hashable
, egPath
frompathlib
The choice I made is (1): only apply the check of slashes in the key if the key is an instance of
str
.Another choice (2)would be to project the
Hashable
space ontostr
space:str(variable_name)
(1) seems more conservative than (2) as I do not pretend to be able to get a string representation for any
Hashable
.