Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make illegal path-like variable names when constructing a DataTree from a Dataset #9378

Conversation

etienneschalk
Copy link
Contributor

@etienneschalk etienneschalk commented Aug 18, 2024

Technical Note

Regarding Hashable vs str Dataset keys

Note: DataTree keys are Hashable. I only check for slashes in the variable names if they are instance of str.
I never encountered a case (yet) where a Dataset keys are not str but Hashable in the broader case.
We can imagine corner-cases where keys would be other types of Hashable, eg Path from pathlib

In [2]: from pathlib import Path

In [3]: hash(Path("/"))
Out[3]: -3809984204556177651

The choice I made is (1): only apply the check of slashes in the key if the key is an instance of str.
Another choice (2)would be to project the Hashable space onto str space: str(variable_name)
(1) seems more conservative than (2) as I do not pretend to be able to get a string representation for any Hashable.

@etienneschalk etienneschalk marked this pull request as draft August 18, 2024 12:11
@etienneschalk etienneschalk marked this pull request as ready for review August 18, 2024 12:49
@etienneschalk
Copy link
Contributor Author

cc @TomNicholas

@TomNicholas TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label Aug 19, 2024
Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @etienneschalk ! Just two small comments.

xarray/core/datatree.py Outdated Show resolved Hide resolved
xarray/core/datatree.py Outdated Show resolved Hide resolved
Comment on lines 166 to 171
raise KeyError(
"Given Dataset contains variable names with '/': "
f"{offending_variable_names}. "
"A Dataset represents a group, and a single group "
"cannot have path-like variable names with '/' characters in them. "
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good, but @keewis you mentioned you had a comment didn't you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the improvement to be made here is to clarify that currently slashes are allowed in variable names on xr.Dataset, but are forbidden in variable names in xr.DataTree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this has to be considered here, but NetCDF4 and HDF5 do not allow forward slashes in variable/dataset names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I think that doesn't change anything for this PR, so long as that restriction is already explicitly accounted for in the respective backend save methods.

xarray/core/datatree.py Outdated Show resolved Hide resolved
@TomNicholas TomNicholas added the plan to merge Final call for comments label Sep 9, 2024
xarray/core/datatree.py Outdated Show resolved Hide resolved
@etienneschalk
Copy link
Contributor Author

Integrated the suggested changes and updated the test message, ready to be merged a priori @TomNicholas

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

It would also be nice to add a check like this for child names.

@TomNicholas TomNicholas merged commit 8bd456c into pydata:main Sep 12, 2024
28 checks passed
dcherian added a commit to dcherian/xarray that referenced this pull request Sep 17, 2024
* main: (26 commits)
  Forbid modifying names of DataTree objects with parents (pydata#9494)
  DAS-2155 - Merge datatree documentation into main docs. (pydata#9033)
  Make illegal path-like variable names when constructing a DataTree from a Dataset (pydata#9378)
  Ensure TreeNode doesn't copy in-place (pydata#9482)
  `open_groups` for zarr backends (pydata#9469)
  Update pyproject.toml (pydata#9484)
  New whatsnew section (pydata#9483)
  Release notes for v2024.09.0 (pydata#9480)
  Fix `DataTree.coords.__setitem__` by adding `DataTreeCoordinates` class (pydata#9451)
  Rename DataTree's "ds" and "data" to "dataset" (pydata#9476)
  Update DataTree repr to indicate inheritance (pydata#9470)
  Bump pypa/gh-action-pypi-publish in the actions group (pydata#9460)
  Repo checker (pydata#9450)
  Add days_in_year and decimal_year to dt accessor (pydata#9105)
  remove parent argument from DataTree.__init__ (pydata#9465)
  Fix inheritance in DataTree.copy() (pydata#9457)
  Implement `DataTree.__delitem__` (pydata#9453)
  Add ASV for datatree.from_dict (pydata#9459)
  Make the first argument in DataTree.from_dict positional only (pydata#9446)
  Fix typos across the code, doc and comments (pydata#9443)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Sep 17, 2024
* main:
  Opt out of floor division for float dtype time encoding (pydata#9497)
  fixed formatting for whats-new (pydata#9493)
  Forbid modifying names of DataTree objects with parents (pydata#9494)
  DAS-2155 - Merge datatree documentation into main docs. (pydata#9033)
  Make illegal path-like variable names when constructing a DataTree from a Dataset (pydata#9378)
  Ensure TreeNode doesn't copy in-place (pydata#9482)
  `open_groups` for zarr backends (pydata#9469)
  Update pyproject.toml (pydata#9484)
  New whatsnew section (pydata#9483)
hollymandel pushed a commit to hollymandel/xarray that referenced this pull request Sep 23, 2024
…om a Dataset (pydata#9378)

* Make illegal path-like variable names when constructing a DataTree from a Dataset

* Updated whats-new.rst

* PR comments

* Revert diff

* Update xarray/core/datatree.py

Co-authored-by: Tom Nicholas <[email protected]>

* Update xarray/core/datatree.py

Co-authored-by: Tom Nicholas <[email protected]>

* Update xarray/tests/test_datatree.py

Co-authored-by: Tom Nicholas <[email protected]>

* Update expected Exception message in test

* Merge changes from pydata#9476

* Fix

---------

Co-authored-by: Tom Nicholas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments topic-DataTree Related to the implementation of a DataTree class
Projects
Development

Successfully merging this pull request may close these issues.

4 participants