Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement Datatree typed ops #9619

Merged
merged 23 commits into from
Oct 15, 2024
Merged

Conversation

TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented Oct 13, 2024

Follows on from #9589, but for the ops defined in _typed_ops.py.

Does this mean we can remove most/all of datatree_ops.py?

EDIT: Answering my own question - none of datatree_ops.py is currently used since we disabled it. It does contain lists of methods we should implement, many of which are now implemented in this PR and #9589. As the only purpose it serves right now is tracking remaining un-implemented methods, we should probably remove it in favour of creating a new github issue.

cc @shoyer, @flamingbear

@TomNicholas TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label Oct 13, 2024
"a grouped object are not permitted"
)

# TODO requires an implementation of map_over_subtree_inplace
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will just do in-place ops in a follow-up PR

xarray/tests/test_datatree.py Outdated Show resolved Hide resolved
@TomNicholas TomNicholas changed the title Reimplement Datatree ops Reimplement Datatree typed ops Oct 14, 2024
@TomNicholas
Copy link
Member Author

Pretty sure the mypy failure in the last commit is unrelated to this PR (which passed mypy before today) - see #9618 (comment).

@kmuehlbauer
Copy link
Contributor

@TomNicholas #9621 for the mypy issue

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tom!

I think we should also revise DataTree arithmetic to support arithmetic between trees with child nodes defined in different orders, but that can come later. I will try to work on the implementation of the core machinery on my long flight today....

Comment on lines 1501 to 1503
if isinstance(other, GroupBy):
# TODO should we be trying to make this work?
raise NotImplementedError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if isinstance(other, GroupBy):
# TODO should we be trying to make this work?
raise NotImplementedError
return NotImplemented

NotImplemented is a sentinel value that tells Python that an arithmetic operator is not implemented, and allows the other argument to try implementing it. If all special methods return NotImplemented, then Python raises an informative TypeError.

Should we also explicitly exclude Dataset here, or are the "mapped over all nodes" semantics of DataTree + Dataset arithmetic obvious enough? #9365

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all special methods return NotImplemented, then Python raises an informative TypeError.

Ah I did not realise that part. So NotImplemented is fine here for now.

Should we also explicitly exclude Dataset here, or are the "mapped over all nodes" semantics of DataTree + Dataset arithmetic obvious enough?

I think it's fine to allow Dataset - I did that deliberately. It's also tested now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 9e2dfad

"a grouped object are not permitted"
)

# TODO requires an implementation of map_over_subtree_inplace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this actually needs a different implementation, in order to handle error recovery properly.

I would suggest handling this like Dataset._inplace_binary_op, which builds result Dataset and then replaces the contents of the current dataset with those of the new dataset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I've raised #9629 to track that, so I'll remove this pseudocode from this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 304eb19

@TomNicholas TomNicholas enabled auto-merge (squash) October 15, 2024 13:57
@TomNicholas TomNicholas merged commit 97ec434 into pydata:main Oct 15, 2024
28 checks passed
@TomNicholas TomNicholas deleted the datatree-ops branch October 15, 2024 14:16
xarray/util/generate_ops.py Show resolved Hide resolved
xarray/tests/test_datatree.py Show resolved Hide resolved
@headtr1ck
Copy link
Collaborator

@TomNicholas this should definitely go into whats-new!

@TomNicholas
Copy link
Member Author

this should definitely go into whats-new!

Generally with the datatree stuff we have been taking the approach that since the previous versions of xarray never had DataTree publicly available, that "what's new" isn't really appropriate for all the internal work going on. Instead we're just going to have one what's new entry announcing the new class, and summarise the important changes relative to the old version of datatree in the other repository in a dedicated migration guide.

dcherian added a commit to TomAugspurger/xarray that referenced this pull request Oct 21, 2024
* main:
  Fix multiple grouping with missing groups (pydata#9650)
  flox: Properly propagate multiindex (pydata#9649)
  Update Datatree html repr to indicate inheritance (pydata#9633)
  Re-implement map_over_datasets using group_subtrees (pydata#9636)
  fix zarr intersphinx (pydata#9652)
  Replace black and blackdoc with ruff-format (pydata#9506)
  Fix error and missing code cell in io.rst (pydata#9641)
  Support alternative names for the root node in DataTree.from_dict (pydata#9638)
  Updates to DataTree.equals and DataTree.identical (pydata#9627)
  DOC: Clarify error message in open_dataarray (pydata#9637)
  Add zip_subtrees for paired iteration over DataTrees (pydata#9623)
  Type check datatree tests (pydata#9632)
  Add missing `memo` argument to DataTree.__deepcopy__ (pydata#9631)
  Bug fixes for DataTree indexing and aggregation (pydata#9626)
  Add inherit=False option to DataTree.copy() (pydata#9628)
  docs(groupby): mention deprecation of `squeeze` kwarg (pydata#9625)
  Migration guide for users of old datatree repo (pydata#9598)
  Reimplement Datatree typed ops (pydata#9619)
dcherian added a commit to dcherian/xarray that referenced this pull request Oct 22, 2024
* main: (63 commits)
  Add close() method to DataTree and use it to clean-up open files in tests (pydata#9651)
  Change URL for pydap test (pydata#9655)
  Fix multiple grouping with missing groups (pydata#9650)
  flox: Properly propagate multiindex (pydata#9649)
  Update Datatree html repr to indicate inheritance (pydata#9633)
  Re-implement map_over_datasets using group_subtrees (pydata#9636)
  fix zarr intersphinx (pydata#9652)
  Replace black and blackdoc with ruff-format (pydata#9506)
  Fix error and missing code cell in io.rst (pydata#9641)
  Support alternative names for the root node in DataTree.from_dict (pydata#9638)
  Updates to DataTree.equals and DataTree.identical (pydata#9627)
  DOC: Clarify error message in open_dataarray (pydata#9637)
  Add zip_subtrees for paired iteration over DataTrees (pydata#9623)
  Type check datatree tests (pydata#9632)
  Add missing `memo` argument to DataTree.__deepcopy__ (pydata#9631)
  Bug fixes for DataTree indexing and aggregation (pydata#9626)
  Add inherit=False option to DataTree.copy() (pydata#9628)
  docs(groupby): mention deprecation of `squeeze` kwarg (pydata#9625)
  Migration guide for users of old datatree repo (pydata#9598)
  Reimplement Datatree typed ops (pydata#9619)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

Successfully merging this pull request may close these issues.

datatree ops.py migration cleanup
4 participants