Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding support for the zarr v3 spec #4382

Open
grlee77 opened this issue Apr 12, 2022 · 0 comments
Open

adding support for the zarr v3 spec #4382

grlee77 opened this issue Apr 12, 2022 · 0 comments
Assignees
Labels
feature New feature or request task

Comments

@grlee77
Copy link
Contributor

grlee77 commented Apr 12, 2022

🧰 Task

This issue outlines what will likely be needed on the napari side to support the upcoming zarr v3 spec. A primary difference introduced in v3 stores is that the data chunks and the corresponding metadata are stored in separate data/root/ and meta/root/ directory trees, respectively. Additionally the data chunks are stored in a nested file format rather than a flat file format by default. These changes are intended to make working with large data having many chunks more responsively, particularly over cloud storage.

We have recently merged support for the proposed zarr v3 spec in the main development branch of zarr-python and are starting to look at a few downstream projects for additional testing prior to release.

napari.utils.io.magic_imread

Currently uses guess_zarr_path (see below) to choose when to attempt opening with read_zarr_dataset. This function likely doesn't need any v3-specific changes.

napari.utils.io.guess_zarr_path

current behavior

  • looks for a folder ending with '.zarr' within the path

needed changes

napari.utils.io.read_zarr_dataset

current behavior

needed changes for v3

  • .zarray and .zgroup don't exist in v3. Array and group metadata are in a separate meta/root/ folder with filenames ending in.array.json and .group.json, respectively. (technically, the spec also allows specifying an extension other than .json for the metadata, but currently zarr-python always uses JSON)

The data is then read via dask.array.from_zarr which can already read v3 files if we pass a component kwarg (and possibly a zarr_version kwarg). See a WIP dask PR adding tests for this here: dask/dask#8918. That PR also shows example file listings, as a concrete example of the difference in default file layout for v3 vs. v2 stores.

I think for v3 we should be able to adapt to open an array when passed a path to the array metadata or a group of arrays when passed a path to group metadata. We could also open based on the path to the data folder of an array or group as well by traversing up the tree until we find the root path containing the required zarr.json metadata. We can then extract the desired array key name(s) to use as the component in calls like:

da.from_zarr(root_path, component=array_key, zarr_version=3)

cc @joshmoore, @MSanKeys963, @jakirkham, @rabernat, @martindurant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request task
Projects
None yet
Development

No branches or pull requests

1 participant