Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for collapsing subtrees #192

Closed
TomNicholas opened this issue Jan 9, 2023 · 4 comments
Closed

API for collapsing subtrees #192

TomNicholas opened this issue Jan 9, 2023 · 4 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@TomNicholas
Copy link
Collaborator

TomNicholas commented Jan 9, 2023

Multiple people have requested methods for collapsing children or subtrees into single nodes.

Following on from our discourse discussion @abkfenris @patrickcgray

Also see pydata/xarray#3996 (comment) @rabernat @


How about an API like this?

class DataTree:
    def merge_subtree(self, **merge_options) -> DataTree:
        ...

    def concat_subtree(self, **concat_options) -> DataTree:
        ...

    def combine_subtree(self, combine="by_coords", **combine_options) -> DataTree:
        ...

or maybe even the _subtree suffix is superfluous...

Then the usage might be

dt['/path/to/node'] = dt['/path/to/node'].merge_subtree()

You could chain this with the new .filter() too which would be neat.

@TomNicholas TomNicholas added enhancement New feature or request good first issue Good for newcomers labels Jan 9, 2023
@abkfenris
Copy link

Hmm, I'm having a hard time pondering through the exact implications or coming up with examples of merging subtrees with more than one level, and what logical results would be, but it feels like there could quickly be unexpected results.

  • Is the operation only being performed on one level, or are multiple levels being merge/concat/combined together?
  • If the operation is only being performed on one level, what happens to their children when they have the same name? Are they also merged/concat/combined but stay at their existing level?

What about starting with _child_datasets methods as the effects are more clearly defined?

@TomNicholas
Copy link
Collaborator Author

TomNicholas commented Jan 19, 2023

Is the operation only being performed on one level, or are multiple levels being merge/concat/combined together?

I was suggesting the latter, but I can also see why you might want the former too.

If the operation is only being performed on one level, what happens to their children when they have the same name? Are they also merged/concat/combined but stay at their existing level?

I don't see how you are supposed to do this without causing ambiguities in naming nodes.
If I have siblings a and b, which each have children x and y, (so overall I have a/{x,y}, & b/{x,y}), and I only want to merge x with y in both branches, then that's okay. But if I want to merge a with b without merging x and y's then what name should replace a or b at the top level?

This is why I suggested merge_subtree, because merging x's and y's is basically calling merge_subtree on a and b sequentially, and merge_subtree would also be unambiguous when called at the root (it would merge all 4 of [a/{x,y}, b{x,y}] into one).

@akanshajais
Copy link

image

Yes, that's a great idea! Having methods like merge_subtree, concat_subtree, and combine_subtree would make it much easier to collapse subtrees into single nodes.

Here, i write an example for implementation of these methods:

def merge_subtree(self, node_path, **merge_options):
    """
    Merges all children nodes of the given node into a single node, using the
    given merge options.
    """
    subtree_data = self.get_subtree_data(node_path)
    merged_data = merge_data(subtree_data, **merge_options)
    self.set_node_data(node_path, merged_data)
    self.remove_subtree(node_path)

def concat_subtree(self, node_path, **concat_options):
    """
    Concatenates all children nodes of the given node into a single node, using the
    given concatenation options.
    """
    subtree_data = self.get_subtree_data(node_path)
    concatenated_data = concat_data(subtree_data, **concat_options)
    self.set_node_data(node_path, concatenated_data)
    self.remove_subtree(node_path)

def combine_subtree(self, node_path, combine="by_coords", **combine_options):
    """
    Combines all children nodes of the given node into a single node, using the
    given combination method and options.
    """
    subtree_data = self.get_subtree_data(node_path)
    combined_data = combine_data(subtree_data, method=combine, **combine_options)
    self.set_node_data(node_path, combined_data)
    self.remove_subtree(node_path)

`
we could then use these methods like this:

dt.merge_subtree("/path/to/node", method="sum")

dt.concat_subtree("/path/to/node", dim="time")

dt.combine_subtree("/path/to/node", method="merge", compat="override")

These methods would modify the DataTree object in place, merging, concatenating, or combining the children nodes of the specified node. we could then access the merged, concatenated, or combined node using its path.

@flamingbear
Copy link
Contributor

Closed for pydata/xarray#9349

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants