Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding open_groups to BackendEntryPointEngine, NetCDF4BackendEntrypoint, and H5netcdfBackendEntrypoint #9243

Merged
merged 33 commits into from
Aug 14, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
33ee4a9
sandbox open groups
eni-awowale Jul 13, 2024
8186d86
rough implementation of open_groups
eni-awowale Jul 13, 2024
6b63704
removed unused imports
eni-awowale Jul 13, 2024
ef01edc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 13, 2024
08e230a
oops deleted optional
eni-awowale Jul 13, 2024
e01b0fb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 13, 2024
9e1984c
commit to test from main
eni-awowale Jul 26, 2024
33a71ac
added tests and small sample file
eni-awowale Jul 30, 2024
b10fa10
merge main into open_groups
eni-awowale Jul 30, 2024
565ffb1
updated what is new
eni-awowale Jul 30, 2024
ce607e6
updated: open_groups to include fullpath of group, improved test to c…
eni-awowale Jul 30, 2024
eaba908
update float_ to float64 for numpy 2.0
eni-awowale Jul 30, 2024
b4b9822
added pr suggestions and mypy changes
eni-awowale Aug 2, 2024
9b9c1e7
merge conflict plugins.py
eni-awowale Aug 6, 2024
225489d
Merge branch 'main' into open_groups
eni-awowale Aug 6, 2024
5fe8e96
added mutable mapping
eni-awowale Aug 6, 2024
3b9c418
added mutable mapping to api
eni-awowale Aug 6, 2024
222279c
lets see if this passes mypy
eni-awowale Aug 7, 2024
4c65b0a
mypy take 2
eni-awowale Aug 7, 2024
8c81a87
mypy
eni-awowale Aug 7, 2024
d2c74d6
updated open_groups_dict
eni-awowale Aug 8, 2024
f206408
changed return type for DataTree.from_dict
eni-awowale Aug 8, 2024
5d34920
Merge branch 'main' into open_groups
eni-awowale Aug 8, 2024
f72f3d2
fix test failures
eni-awowale Aug 8, 2024
2f92b5c
update iter_nc_ to yield parent
eni-awowale Aug 8, 2024
175e287
Merge branch 'main' into open_groups
eni-awowale Aug 13, 2024
6319678
mypy suggestions
eni-awowale Aug 13, 2024
1466147
adding casting
eni-awowale Aug 13, 2024
0e3c946
explicitly convert to dict
eni-awowale Aug 13, 2024
1ffeae5
Merge branch 'main' into open_groups
dcherian Aug 14, 2024
abd4981
Merge branch 'main' into open_groups
eni-awowale Aug 14, 2024
d44bf98
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 14, 2024
b2cf9b4
updated to add d_cast for remaining functions
eni-awowale Aug 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,8 @@ New Features
to return an object without ``attrs``. A ``deep`` parameter controls whether
variables' ``attrs`` are also dropped.
By `Maximilian Roos <https://github.com/max-sixty>`_. (:pull:`8288`)
By `Eni Awowale <https://github.com/eni-awowale>`_.
- Add `open_groups` method for unaligned datasets (:issue:`9137`, :pull:`9243`)

Breaking changes
~~~~~~~~~~~~~~~~
Expand Down
37 changes: 37 additions & 0 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -837,6 +837,43 @@ def open_datatree(
return backend.open_datatree(filename_or_obj, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have a default implementation here that calls open_groups, i.e.

Suggested change
return backend.open_datatree(filename_or_obj, **kwargs)
groups_dict = backend.open_datatree(filename_or_obj, **kwargs)
return DataTree.from_dict(groups_dict)

The idea being that then backend developers don't actually have to implement open_datatree if they have implemented open_groups...

This was sort of discussed here (@keewis) #7437 (comment), but this seems like an rabbit hole that should be left for a future PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really, I was arguing that given any one of open_dataarray, open_dataset and open_datatree allows us to provide (somewhat inefficient) default implementations for the others. However, open_groups has a much closer relationship to open_datatree, so I think having a default implementation for open_datatree is fine (we just need to make sure that a backend that provides neither open_groups nor open_datatree doesn't complain about open_groups not existing if you called open_datatree).

So yeah, this might become a rabbit hole.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I see. That seems related, but also like a totally optional convenience feature that we should defer to later.



def open_groups(
filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
engine: T_Engine = None,
**kwargs,
) -> dict[str, Dataset]:
"""
Open and decode a file or file-like object, creating a dictionary containing one xarray Dataset for each group in the file.
Useful for an HDF file ("netcdf4" or "h5netcdf") containing many groups that are not alignable with their parents
and cannot be opened directly with ``open_datatree``. It is encouraged to use this function to inspect your data,
then make the necessary changes to make the structure coercible to a `DataTree` object before calling `DataTree.from_dict()` and proceeding with your analysis.

Parameters
----------
filename_or_obj : str, Path, file-like, or DataStore
Strings and Path objects are interpreted as a path to a netCDF file.
engine : str, optional
Xarray backend engine to use. Valid options include `{"netcdf4", "h5netcdf"}`.
**kwargs : dict
Additional keyword arguments passed to :py:func:`~xarray.open_dataset` for each group.

Returns
-------
dict[str, xarray.Dataset]

See Also
--------
open_datatree()
DataTree.from_dict()
"""
if engine is None:
engine = plugins.guess_engine(filename_or_obj)

backend = plugins.get_backend(engine)

return backend.open_groups(filename_or_obj, **kwargs)


def open_mfdataset(
paths: str | NestedSequence[str | os.PathLike],
chunks: T_Chunks | None = None,
Expand Down
11 changes: 11 additions & 0 deletions xarray/backends/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -535,6 +535,17 @@ def open_datatree(

raise NotImplementedError()

def open_groups(
eni-awowale marked this conversation as resolved.
Show resolved Hide resolved
self,
filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
**kwargs: Any,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should not make the same mistake as with open_dataset and prevent liskov errors.

Suggested change
**kwargs: Any,

If the abstract method supports any kwargs, so must all subclass implementations, which is not what we want.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@headtr1ck from my understanding I think the **kwargs were added back to fix this issue #9135

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, Not sure.
But since this is the same problem in all other backend methods, I'm fine with leaving it as it is (and possibly change it in a future PR all together).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good we can revisit this on another PR.

) -> dict[str, Dataset]:
"""
Backend open_groups method used by Xarray in :py:func:`~xarray.open_groups`.
"""
eni-awowale marked this conversation as resolved.
Show resolved Hide resolved

raise NotImplementedError()


# mapping of engine name to (module name, BackendEntrypoint Class)
BACKEND_ENTRYPOINTS: dict[str, tuple[str | None, type[BackendEntrypoint]]] = {}
52 changes: 40 additions & 12 deletions xarray/backends/h5netcdf_.py
Original file line number Diff line number Diff line change
Expand Up @@ -448,9 +448,36 @@ def open_datatree(
driver_kwds=None,
**kwargs,
) -> DataTree:

from xarray.core.datatree import DataTree

groups_dict = self.open_groups(filename_or_obj, **kwargs)

return DataTree.from_dict(groups_dict)

def open_groups(
self,
filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
*,
mask_and_scale=True,
decode_times=True,
concat_characters=True,
decode_coords=True,
drop_variables: str | Iterable[str] | None = None,
use_cftime=None,
decode_timedelta=None,
group: str | Iterable[str] | Callable | None = None,
lock=None,
eni-awowale marked this conversation as resolved.
Show resolved Hide resolved
invalid_netcdf=None,
phony_dims=None,
decode_vlen_strings=True,
driver=None,
driver_kwds=None,
Comment on lines +472 to +476
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These shouldn't be here right? They should all fall under `**kwargs``

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe they should be in the specific backend but not in common.py?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so these were added from PR #9199 for adding back the backend specific keyword arguments. I pulled this into my branch after it was merged to main. But they are not in common.py , they are consolidated as **kwargs in common.py.

**kwargs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be obsolete as well, when you remove it from the abstract method.

) -> dict[str, Dataset]:

from xarray.backends.api import open_dataset
from xarray.backends.common import _iter_nc_groups
from xarray.core.datatree import DataTree
from xarray.core.treenode import NodePath
from xarray.core.utils import close_on_error

Expand All @@ -466,19 +493,23 @@ def open_datatree(
driver=driver,
driver_kwds=driver_kwds,
)
# Check for a group and make it a parent if it exists
if group:
parent = NodePath("/") / NodePath(group)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eni-awowale this is how you should join paths

else:
parent = NodePath("/")

manager = store._manager
ds = open_dataset(store, **kwargs)
tree_root = DataTree.from_dict({str(parent): ds})

# Open root group with `xr.open_dataset()` and it to dictionary of groups
ds = open_dataset(filename_or_obj, **kwargs)
groups_dict = {str(parent): ds}
keewis marked this conversation as resolved.
Show resolved Hide resolved

for path_group in _iter_nc_groups(store.ds, parent=parent):
group_store = H5NetCDFStore(manager, group=path_group, **kwargs)
store_entrypoint = StoreBackendEntrypoint()
with close_on_error(group_store):
ds = store_entrypoint.open_dataset(
group_ds = store_entrypoint.open_dataset(
group_store,
mask_and_scale=mask_and_scale,
decode_times=decode_times,
Expand All @@ -488,14 +519,11 @@ def open_datatree(
use_cftime=use_cftime,
decode_timedelta=decode_timedelta,
)
new_node: DataTree = DataTree(name=NodePath(path_group).name, data=ds)
tree_root._set_item(
path_group,
new_node,
allow_overwrite=False,
new_nodes_along_path=True,
)
return tree_root

group_name = NodePath(path_group).name
groups_dict[group_name] = group_ds

return groups_dict


BACKEND_ENTRYPOINTS["h5netcdf"] = ("h5netcdf", H5netcdfBackendEntrypoint)
51 changes: 39 additions & 12 deletions xarray/backends/netCDF4_.py
Original file line number Diff line number Diff line change
Expand Up @@ -688,9 +688,35 @@ def open_datatree(
autoclose=False,
**kwargs,
) -> DataTree:

from xarray.core.datatree import DataTree

groups_dict = self.open_groups(filename_or_obj, **kwargs)

return DataTree.from_dict(groups_dict)

def open_groups(
self,
filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
*,
mask_and_scale=True,
decode_times=True,
concat_characters=True,
decode_coords=True,
drop_variables: str | Iterable[str] | None = None,
use_cftime=None,
decode_timedelta=None,
group: str | Iterable[str] | Callable | None = None,
format="NETCDF4",
clobber=True,
diskless=False,
persist=False,
lock=None,
autoclose=False,
**kwargs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

) -> DataTree:
from xarray.backends.api import open_dataset
from xarray.backends.common import _iter_nc_groups
from xarray.core.datatree import DataTree
from xarray.core.treenode import NodePath

filename_or_obj = _normalize_path(filename_or_obj)
Expand All @@ -704,19 +730,24 @@ def open_datatree(
lock=lock,
autoclose=autoclose,
)

# Check for a group and make it a parent if it exists
if group:
parent = NodePath("/") / NodePath(group)
else:
parent = NodePath("/")

manager = store._manager
ds = open_dataset(store, **kwargs)
tree_root = DataTree.from_dict({str(parent): ds})

# Open root group with `xr.open_dataset() and to dictionary of groups
ds = open_dataset(filename_or_obj, **kwargs)
groups_dict = {str(parent): ds}

for path_group in _iter_nc_groups(store.ds, parent=parent):
group_store = NetCDF4DataStore(manager, group=path_group, **kwargs)
store_entrypoint = StoreBackendEntrypoint()
with close_on_error(group_store):
ds = store_entrypoint.open_dataset(
group_ds = store_entrypoint.open_dataset(
group_store,
mask_and_scale=mask_and_scale,
decode_times=decode_times,
Expand All @@ -726,14 +757,10 @@ def open_datatree(
use_cftime=use_cftime,
decode_timedelta=decode_timedelta,
)
new_node: DataTree = DataTree(name=NodePath(path_group).name, data=ds)
tree_root._set_item(
path_group,
new_node,
allow_overwrite=False,
new_nodes_along_path=True,
)
return tree_root
group_name = NodePath(path_group).name
groups_dict[group_name] = group_ds

return groups_dict


BACKEND_ENTRYPOINTS["netcdf4"] = ("netCDF4", NetCDF4BackendEntrypoint)
2 changes: 1 addition & 1 deletion xarray/backends/plugins.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ def get_backend(engine: str | type[BackendEntrypoint]) -> BackendEntrypoint:
engines = list_engines()
if engine not in engines:
raise ValueError(
f"unrecognized engine {engine} must be one of: {list(engines)}"
f"unrecognized engine {engine} must be one of your download engines: {list(engines)}"
eni-awowale marked this conversation as resolved.
Show resolved Hide resolved
)
backend = engines[engine]
elif isinstance(engine, type) and issubclass(engine, BackendEntrypoint):
Expand Down
Binary file added xarray/tests/data/test_data_not_aligned.nc
Binary file not shown.
49 changes: 48 additions & 1 deletion xarray/tests/test_backends_datatree.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
from __future__ import annotations

import os
from typing import TYPE_CHECKING, cast

import pytest

import xarray as xr
from xarray.backends.api import open_datatree
from xarray.backends.api import open_datatree, open_groups
from xarray.core.datatree import DataTree
from xarray.testing import assert_equal
from xarray.tests import (
Expand All @@ -26,8 +27,8 @@
original_dt = simple_datatree
original_dt.to_netcdf(filepath, engine=self.engine)

roundtrip_dt = open_datatree(filepath, engine=self.engine)

Check failure on line 30 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / macos-latest py3.9

TestH5NetCDFDatatreeIO.test_to_netcdf ValueError: invalid format for h5netcdf backend

Check failure on line 30 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.11 all-but-dask

TestH5NetCDFDatatreeIO.test_to_netcdf ValueError: invalid format for h5netcdf backend

Check failure on line 30 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / macos-latest py3.12

TestH5NetCDFDatatreeIO.test_to_netcdf ValueError: invalid format for h5netcdf backend

Check failure on line 30 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.9 min-all-deps

TestH5NetCDFDatatreeIO.test_to_netcdf ValueError: invalid format for h5netcdf backend

Check failure on line 30 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.9

TestH5NetCDFDatatreeIO.test_to_netcdf ValueError: invalid format for h5netcdf backend

Check failure on line 30 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.12

TestH5NetCDFDatatreeIO.test_to_netcdf ValueError: invalid format for h5netcdf backend
assert_equal(original_dt, roundtrip_dt)

Check failure on line 31 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / macos-latest py3.9

TestNetCDF4DatatreeIO.test_to_netcdf AssertionError: Left and right DataTree objects are not equal Number of children on node '/set1' of the left object: 2 Number of children on node '/set1' of the right object: 0

Check failure on line 31 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.11 all-but-dask

TestNetCDF4DatatreeIO.test_to_netcdf AssertionError: Left and right DataTree objects are not equal Number of children on node '/set1' of the left object: 2 Number of children on node '/set1' of the right object: 0

Check failure on line 31 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / macos-latest py3.12

TestNetCDF4DatatreeIO.test_to_netcdf AssertionError: Left and right DataTree objects are not equal Number of children on node '/set1' of the left object: 2 Number of children on node '/set1' of the right object: 0

Check failure on line 31 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.9 min-all-deps

TestNetCDF4DatatreeIO.test_to_netcdf AssertionError: Left and right DataTree objects are not equal Number of children on node '/set1' of the left object: 2 Number of children on node '/set1' of the right object: 0

Check failure on line 31 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.9

TestNetCDF4DatatreeIO.test_to_netcdf AssertionError: Left and right DataTree objects are not equal Number of children on node '/set1' of the left object: 2 Number of children on node '/set1' of the right object: 0

Check failure on line 31 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.12

TestNetCDF4DatatreeIO.test_to_netcdf AssertionError: Left and right DataTree objects are not equal Number of children on node '/set1' of the left object: 2 Number of children on node '/set1' of the right object: 0

def test_to_netcdf_inherited_coords(self, tmpdir):
filepath = tmpdir / "test.nc"
Expand All @@ -39,7 +40,7 @@
)
original_dt.to_netcdf(filepath, engine=self.engine)

roundtrip_dt = open_datatree(filepath, engine=self.engine)

Check failure on line 43 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / macos-latest py3.9

TestH5NetCDFDatatreeIO.test_to_netcdf_inherited_coords ValueError: invalid format for h5netcdf backend

Check failure on line 43 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.11 all-but-dask

TestH5NetCDFDatatreeIO.test_to_netcdf_inherited_coords ValueError: invalid format for h5netcdf backend

Check failure on line 43 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / macos-latest py3.12

TestH5NetCDFDatatreeIO.test_to_netcdf_inherited_coords ValueError: invalid format for h5netcdf backend

Check failure on line 43 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.9 min-all-deps

TestH5NetCDFDatatreeIO.test_to_netcdf_inherited_coords ValueError: invalid format for h5netcdf backend

Check failure on line 43 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.9

TestH5NetCDFDatatreeIO.test_to_netcdf_inherited_coords ValueError: invalid format for h5netcdf backend

Check failure on line 43 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.12

TestH5NetCDFDatatreeIO.test_to_netcdf_inherited_coords ValueError: invalid format for h5netcdf backend
assert_equal(original_dt, roundtrip_dt)
subtree = cast(DataTree, roundtrip_dt["/sub"])
assert "x" not in subtree.to_dataset(inherited=False).coords
Expand All @@ -53,7 +54,7 @@
enc = {"/set2": {var: comp for var in original_dt["/set2"].ds.data_vars}}

original_dt.to_netcdf(filepath, encoding=enc, engine=self.engine)
roundtrip_dt = open_datatree(filepath, engine=self.engine)

Check failure on line 57 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / macos-latest py3.9

TestH5NetCDFDatatreeIO.test_netcdf_encoding ValueError: invalid format for h5netcdf backend

Check failure on line 57 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.11 all-but-dask

TestH5NetCDFDatatreeIO.test_netcdf_encoding ValueError: invalid format for h5netcdf backend

Check failure on line 57 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / macos-latest py3.12

TestH5NetCDFDatatreeIO.test_netcdf_encoding ValueError: invalid format for h5netcdf backend

Check failure on line 57 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.9 min-all-deps

TestH5NetCDFDatatreeIO.test_netcdf_encoding ValueError: invalid format for h5netcdf backend

Check failure on line 57 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.9

TestH5NetCDFDatatreeIO.test_netcdf_encoding ValueError: invalid format for h5netcdf backend

Check failure on line 57 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.12

TestH5NetCDFDatatreeIO.test_netcdf_encoding ValueError: invalid format for h5netcdf backend

assert roundtrip_dt["/set2/a"].encoding["zlib"] == comp["zlib"]
assert roundtrip_dt["/set2/a"].encoding["complevel"] == comp["complevel"]
Expand All @@ -62,6 +63,52 @@
with pytest.raises(ValueError, match="unexpected encoding group.*"):
original_dt.to_netcdf(filepath, encoding=enc, engine=self.engine)

def test_open_datatree(self):
eni-awowale marked this conversation as resolved.
Show resolved Hide resolved
"""Test `open_datatree` with netCDF4 file with this structure:
DataTree('None', parent=None)
│ Dimensions: (lat: 1, lon: 2)
│ Dimensions without coordinates: lat, lon
│ Data variables:
│ root_variable (lat, lon) float64 16B ...
└── DataTree('Group1')
Dimensions: (lat: 2, lon: 2)
Dimensions without coordinates: lat, lon
Data variables:
group_1_var (lat, lon) float64 32B ...
"""
filepath = os.path.join(
os.path.dirname(__file__), "data", "test_data_not_aligned.nc"
)
with pytest.raises(ValueError):
open_datatree(filepath)

def test_open_groups(self):
eni-awowale marked this conversation as resolved.
Show resolved Hide resolved
"""Test `open_groups` with netCDF4 file with this structure:
DataTree('None', parent=None)
│ Dimensions: (lat: 1, lon: 2)
│ Dimensions without coordinates: lat, lon
│ Data variables:
│ root_variable (lat, lon) float64 16B ...
└── DataTree('Group1')
Dimensions: (lat: 2, lon: 2)
Dimensions without coordinates: lat, lon
Data variables:
group_1_var (lat, lon) float64 32B ...
"""
filepath = os.path.join(
os.path.dirname(__file__), "data", "test_data_not_aligned.nc"
)
unaligned_dict_of_datasets = open_groups(filepath)

# Check that group names are keys in the dictionary of `xr.Datasets`
assert "/" in unaligned_dict_of_datasets.keys()
assert "Group1" in unaligned_dict_of_datasets.keys()
# Check that group name returns the correct datasets
assert unaligned_dict_of_datasets["/"].identical(xr.open_dataset(filepath))
assert unaligned_dict_of_datasets["Group1"].identical(
xr.open_dataset(filepath, group="Group1")
)


@requires_netCDF4
class TestNetCDF4DatatreeIO(DatatreeIOBase):
Expand Down
Loading