-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop inheriting non-indexed coordinates for DataTree #9555
Conversation
This is option (4) from pydata#9475 (comment)
xarray/core/datatree.py
Outdated
@@ -438,6 +438,7 @@ class DataTree( | |||
_cache: dict[str, Any] # used by _CachedAccessor | |||
_data_variables: dict[Hashable, Variable] | |||
_node_coord_variables: dict[Hashable, Variable] | |||
_node_indexed_coord_variables: dict[Hashable, Variable] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these are indexed, should the type here be IndexVariable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or should we have just _node_coord_variables
and which ones are indexed is determined by examining _node_indexes
? (So _node_indexed_coord_variables
becomes a derived property)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or should we have
_node_non_indexed_coord_variables
_node_indexed_coord_variables
_node_indexes
The current naming is kind of unclear whether _node_coord_variables
includes _node_indexed_coord_variables
or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main thinking with this implementation was to make accessing the _coord_variables
property linear in the depth of the tree, not linear in the number of coordinate variables (I'll add a comment about this in the code).
I'll play around with an alternative version here that avoids the need to store the extra state variable.
"/": xr.Dataset(coords={"x": [1], "y": 2}), | ||
"/b": xr.Dataset(coords={"z": 3}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we're planning to eventually change the API of the Dataset
constructor to make it more explicit which coordinates have indexes, it might be nice to at least add comments here to describe the intention.
(sort of related to #8959)
I guess I should have asked this in the meeting earlier (or maybe you discussed this and I was too tired): do we still need this PR if we have the explicit APIs of When explicitly opting into coordinate inheritance when converting to dataset objects I'd expect people to either be fine with duplication or manually deduplicate in user code, and with I guess this is somewhat related to inherited coordinates working somewhat like default values right now? |
@keewis this PR eliminates confusing and unnecessary duplication of indexed coordinates, but introduces the annoying asymmetry that indexed coordinates are inherited but non-indexed ones are not. The API ideas you mention were intended to solve the latter problem that this PR would cause, by giving users a way to access even non-indexed coordinates if they really need to. Without this PR we still would get the weird behaviours in #9475, which would require users to do nasty things such as de-duplicate every coordinate manually every time they do an arithmetic operation. Your proposal wouldn't really work for arithmetic, where duplication is definitely annoying, and there is no place for the user to alter the inheritance behaviour. So merging this PR and also adding the API you mention was proposed as the overall best option. |
I was thinking we'd redefine those operations on top of In any case we're restricting new behavior (and are free to relax it later), so at least from that perspective this should be fine. |
@property | ||
def _node_coord_variables_with_index(self) -> Mapping[Hashable, Variable]: | ||
return FilteredMapping( | ||
keys=self._node_indexes, mapping=self._node_coord_variables | ||
) | ||
|
||
@property | ||
def _coord_variables(self) -> ChainMap[Hashable, Variable]: | ||
return ChainMap( | ||
self._node_coord_variables, *(p._node_coord_variables for p in self.parents) | ||
self._node_coord_variables, | ||
*(p._node_coord_variables_with_index for p in self.parents), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a much neater implementation, nice!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ready for another look
This is option (4) from #9475 (comment)