adding support for the zarr v3 spec #4382

grlee77 · 2022-04-12T21:43:36Z

🧰 Task

This issue outlines what will likely be needed on the napari side to support the upcoming zarr v3 spec. A primary difference introduced in v3 stores is that the data chunks and the corresponding metadata are stored in separate data/root/ and meta/root/ directory trees, respectively. Additionally the data chunks are stored in a nested file format rather than a flat file format by default. These changes are intended to make working with large data having many chunks more responsively, particularly over cloud storage.

We have recently merged support for the proposed zarr v3 spec in the main development branch of zarr-python and are starting to look at a few downstream projects for additional testing prior to release.

napari.utils.io.magic_imread

Currently uses guess_zarr_path (see below) to choose when to attempt opening with read_zarr_dataset. This function likely doesn't need any v3-specific changes.

napari.utils.io.guess_zarr_path

current behavior

looks for a folder ending with '.zarr' within the path

needed changes

also check for v3-specific file extensions. This will likely be '.zarr3' or '.zr3' (see recommended file extension for zarr-v3 stores (.zr3?) zarr-developers/zarr-specs#137)

napari.utils.io.read_zarr_dataset

current behavior

if .zarray is present, opens a single array
if .zgroup is present, create a list containing all arrays in the group (currently used for multiscale, but Support for Zarr files with multiple datasets (groups) #1406 discusses potentially adding support for loading as separate layers)

needed changes for v3

.zarray and .zgroup don't exist in v3. Array and group metadata are in a separate meta/root/ folder with filenames ending in.array.json and .group.json, respectively. (technically, the spec also allows specifying an extension other than .json for the metadata, but currently zarr-python always uses JSON)

The data is then read via dask.array.from_zarr which can already read v3 files if we pass a component kwarg (and possibly a zarr_version kwarg). See a WIP dask PR adding tests for this here: dask/dask#8918. That PR also shows example file listings, as a concrete example of the difference in default file layout for v3 vs. v2 stores.

I think for v3 we should be able to adapt to open an array when passed a path to the array metadata or a group of arrays when passed a path to group metadata. We could also open based on the path to the data folder of an array or group as well by traversing up the tree until we find the root path containing the required zarr.json metadata. We can then extract the desired array key name(s) to use as the component in calls like:

da.from_zarr(root_path, component=array_key, zarr_version=3)

cc @joshmoore, @MSanKeys963, @jakirkham, @rabernat, @martindurant

The text was updated successfully, but these errors were encountered:

grlee77 added feature New feature or request task labels Apr 12, 2022

grlee77 self-assigned this Apr 12, 2022

joshmoore mentioned this issue Apr 13, 2022

Convert to npe2 (Fix #27 and #41) ome/napari-ome-zarr#42

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding support for the zarr v3 spec #4382

adding support for the zarr v3 spec #4382

grlee77 commented Apr 12, 2022

adding support for the zarr v3 spec #4382

adding support for the zarr v3 spec #4382

Comments

grlee77 commented Apr 12, 2022

🧰 Task

napari.utils.io.magic_imread

napari.utils.io.guess_zarr_path

napari.utils.io.read_zarr_dataset