Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datatree: Tree-aware dataset handling/selection #9346

Open
keewis opened this issue Aug 13, 2024 · 0 comments
Open

datatree: Tree-aware dataset handling/selection #9346

keewis opened this issue Aug 13, 2024 · 0 comments
Labels
topic-DataTree Related to the implementation of a DataTree class

Comments

@keewis
Copy link
Collaborator

keewis commented Aug 13, 2024

What is your issue?

I'm looking for a good way to apply a function to a subset of nodes that share some common characteristics encoded in the subtree path.

Imagine the following data tree

import xarray as xr
import datatree
from datatree import map_over_subtree

dt = datatree.DataTree.from_dict({
    'control/lowRes' : xr.Dataset({'z':(('x'),[0,1,2])}),
    'control/highRes' : xr.Dataset({'z':(('x'),[0,1,2,3,4,5])}),
    'plus4K/lowRes' : xr.Dataset({'z':(('x'),[0,1,2])}),
    'plus4K/highRes' : xr.Dataset({'z':(('x'),[0,1,2,3,4,5])})
})

To apply a function to all control or all plus4K nodes is straight forward by just selecting the specific subtree, e.g. dt['control']. However, in case all lowRes dataset should be manipulated this becomes more elaborative and I wonder what the best approach would be.

* `dt['control/lowRes','plus4K/lowRes']` is not yet implemented and would also be complex for large data trees

* `dt['*/lowRes']` could be one idea to make the subtree selection more straight forward, where `*` is a wildcard

* `dt.search(regex)` could make this even more general

Currently, I use the @map_over_subtree decorator, which also has some limitations as the function does not know its tree origin (as noted in the code) and it needs to be inferred from the dataset itself, which is sometimes possible (here the length of the dataset) but does not need to be always the case.

@map_over_subtree
def resolution_specific_func(ds):
    if len(ds.x) == 3:
        ds = ds.z*2
    elif len(ds.x) == 6:
        ds = ds.z*4
    return ds

z= resolution_specific_func(dt)

I do not know how the tree information could be passed through the decorator, but maybe it is okay if the DatasetView class has an additional property (e.g. _path) that could be filled with dt.path during the call of DatasetView._from_node()?. This would lead to

@map_over_subtree
def resolution_specific_func(ds):
    if 'lowRes' in ds._path:
        ds = ds.z*2
    if 'highRes' in ds._path:
        ds = ds.z*4
    return ds

and would allow for tree-aware manipulation of the datasets.

What do you think? Happy to open a PR if this makes sense.

Originally posted by @observingClouds in xarray-contrib/datatree#254 (comment)

@keewis keewis added needs triage Issue that has not been reviewed by xarray team member topic-DataTree Related to the implementation of a DataTree class and removed needs triage Issue that has not been reviewed by xarray team member labels Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

No branches or pull requests

1 participant