-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataTree.assign_coords and similar should only set node coords #9472
Comments
What should |
Probably best to error in such cases rather than making an arbitrary/unpredictable choice! |
When I used datatree, I generally did want to apply a number of operations at each node and felt that it was too much typing to always write out Perhaps we should add a new namespace for operations that map over subtrees (it's been a while so I might have the terminology mixed up. apologies if that is the case) |
That's quite a niche use of that method... But I think it is an example of a more general problem.
Yeah I agree that is annoying.
This would be a reasonable solution to the fundamental ambiguity of whether or not a datatree method acts locally or over the whole subtree. It would also make the inheritance structure of EDIT: It would mean that
The mapping namespace would need to be disambiguated from the existing
I'm not a huge fan of adding these parameters to every single method - I would prefer to either use a syntax like dt.subtrees(max_depth=...).assign_coords(...) or just to leave these out for now. |
Okay concrete suggestion:
Methods on the dt2 = dt.assign_coords() and dt2 = dt.copy()
dt2.ds = dt.ds.assign_coords() leave Methods on dt.map_over_subtree.assign_coords() which could maybe be extended to accept arguments: dt.map_over_subtree(max_depth=3).assign_coords() This is incompatible with the current behaviour of the dt.map_over_subtree(lambda: ds) so we add a dt.map_over_subtree.pipe(lambda: ds) The @map_over_subtree
def func(ds):
... though it could be extended to accept arguments later if necessary: @map_over_subtree(max_depth=2)
def func(ds):
... Pros:
Cons:
|
I agree that I'm not sure that this should be a general principle for all DataTree operations. Operations like arithmetic and aggregations seem perfectly well-defined acting on all nodes independently. |
I agree, but my proposal above is attempting to provide intuitive API to do both types of operation. Otherwise it's unclear to users (including myself!) when and why |
I think of The rule could be:
These are essentially the same rules used by Dataset for wrapping DataArray. |
In that case that's mostly what I already had, with the exception of these methods, which presumably should instead act over one node. https://xarray-datatree.readthedocs.io/en/latest/api.html#datatree-contents |
What is your issue?
Currently
DataTree.assign_coords
tries to callDataset.assign_coords
on every node in the subtree. This was always a little weird, but now that we have implemented coordinate inheritance (#9077) it's really silly, because we only need to assign to the root node and the coordinates will be accessible on every child.There's also some kind of bug right now too:
We should change
assign_coords
(and possibly any similar functions likeset_coords
to just operate on one node, and not try to map over the subtree.This would also help make the coordinate-related behaviour of
DataTree
more similar to that ofDataset
, which relates to #9203 (comment).cc @shoyer
The text was updated successfully, but these errors were encountered: