Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add close() method to DataTree and use it to clean-up open files in tests #9651

Merged
merged 4 commits into from
Oct 21, 2024

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Oct 20, 2024

You can now write things like:

with open_datatree(...) as tree:
    ...

which automatically closes the associated files.

This removes a bunch of warnings that were previously issued in unit-tests.

This removes a bunch of warnings that were previously issued in
unit-tests.
Comment on lines 809 to 811
def set_close(self, closers: Mapping[str, Callable[[], None] | None], /) -> None:
for path, close in closers.items():
self[path]._close = close
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to only set the closer on the root node, similar to Dataset.set_close().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean you can presumably pull out the dataset from that node and .set_close on that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The datasets created by DataTree.dataset are ephemeral, so that wouldn't work.

(I think I should probably change this to only act at the local node level)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm modified set_close to only work at the local node level, but close() still closes everything in the subtree.

@TomNicholas TomNicholas added topic-DataTree Related to the implementation of a DataTree class topic-backends labels Oct 20, 2024
Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines 809 to 811
def set_close(self, closers: Mapping[str, Callable[[], None] | None], /) -> None:
for path, close in closers.items():
self[path]._close = close
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean you can presumably pull out the dataset from that node and .set_close on that?

def __exit__(self, exc_type, exc_value, traceback) -> None:
self.close()

def close(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def close(self):
def close(self) -> None:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


def close(self):
for node in self.subtree:
node.dataset.close()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to add a test to verify that calling close() repeatedly does not raise an error.

(Dataset.close is not idempotent, because it replaces _close with None, but the dataset objects on which I'm calling it here are not persistent. Probably I should also make DatasetView.close raise an error.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 809 to 811
def set_close(self, closers: Mapping[str, Callable[[], None] | None], /) -> None:
for path, close in closers.items():
self[path]._close = close
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm modified set_close to only work at the local node level, but close() still closes everything in the subtree.

def __exit__(self, exc_type, exc_value, traceback) -> None:
self.close()

def close(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@shoyer
Copy link
Member Author

shoyer commented Oct 21, 2024

I've added a bunch of tests, and this should be good to go now.

Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Happy for this to be merged.

@shoyer shoyer merged commit 863184d into pydata:main Oct 21, 2024
27 of 29 checks passed
@shoyer shoyer deleted the datatree-close branch October 21, 2024 21:45
dcherian added a commit to TomAugspurger/xarray that referenced this pull request Oct 22, 2024
* main:
  Add close() method to DataTree and use it to clean-up open files in tests (pydata#9651)
  Change URL for pydap test (pydata#9655)
dcherian added a commit to dcherian/xarray that referenced this pull request Oct 22, 2024
* main: (63 commits)
  Add close() method to DataTree and use it to clean-up open files in tests (pydata#9651)
  Change URL for pydap test (pydata#9655)
  Fix multiple grouping with missing groups (pydata#9650)
  flox: Properly propagate multiindex (pydata#9649)
  Update Datatree html repr to indicate inheritance (pydata#9633)
  Re-implement map_over_datasets using group_subtrees (pydata#9636)
  fix zarr intersphinx (pydata#9652)
  Replace black and blackdoc with ruff-format (pydata#9506)
  Fix error and missing code cell in io.rst (pydata#9641)
  Support alternative names for the root node in DataTree.from_dict (pydata#9638)
  Updates to DataTree.equals and DataTree.identical (pydata#9627)
  DOC: Clarify error message in open_dataarray (pydata#9637)
  Add zip_subtrees for paired iteration over DataTrees (pydata#9623)
  Type check datatree tests (pydata#9632)
  Add missing `memo` argument to DataTree.__deepcopy__ (pydata#9631)
  Bug fixes for DataTree indexing and aggregation (pydata#9626)
  Add inherit=False option to DataTree.copy() (pydata#9628)
  docs(groupby): mention deprecation of `squeeze` kwarg (pydata#9625)
  Migration guide for users of old datatree repo (pydata#9598)
  Reimplement Datatree typed ops (pydata#9619)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Nov 3, 2024
* main: (85 commits)
  Refactor out utility functions from to_zarr (pydata#9695)
  Use the same function to floatize coords in polyfit and polyval (pydata#9691)
  Add `DataTree.persist` (pydata#9682)
  Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688)
  Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689)
  Fix inadvertent deep-copying of child data in DataTree (pydata#9684)
  new blank whatsnew (pydata#9679)
  v2024.10.0 release summary (pydata#9678)
  drop the length from `numpy`'s fixed-width string dtypes (pydata#9586)
  fixing behaviour for group parameter in `open_datatree` (pydata#9666)
  Use zarr v3 dimension_names (pydata#9669)
  fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673)
  implement `dask` methods on `DataTree` (pydata#9670)
  support `chunks` in `open_groups` and `open_datatree` (pydata#9660)
  Compatibility for zarr-python 3.x (pydata#9552)
  Update to_dataframe doc to match current behavior (pydata#9662)
  Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658)
  Add close() method to DataTree and use it to clean-up open files in tests (pydata#9651)
  Change URL for pydap test (pydata#9655)
  Fix multiple grouping with missing groups (pydata#9650)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-backends topic-DataTree Related to the implementation of a DataTree class
Projects
Development

Successfully merging this pull request may close these issues.

datatree: Automatically close files using open_datatree context manager
3 participants