-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to the spec and schemata for RFC-2 #242
base: main
Are you sure you want to change the base?
Conversation
Automated Review URLs |
@@ -24,19 +24,13 @@ Status Text: will be provided between numbered versions. Data written with these | |||
Status Text: (an "editor's draft") will not necessarily be supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Row above: Status Text: <a href="../0.4/index.html">0.4</a>.
looks like it needs manual update to 0.5
?
Same for line 612: This edition of the specification is [https://ngff.openmicroscopy.org/0.4/](https://ngff.openmicroscopy.org/0.4/]).
├── A # First row of the plate | ||
│ ├── .zgroup | ||
│ ├── zarr.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this zarr.json
need to be present? I'm not entirely clear from reading the Zarr v3 spec whether you are allowed to have empty directories? (or if the rules are different from Zarr v2 with .zgroup)?
If you aren't allowed empty directories, then does this need a change/clarification to the labels section above where we have:
Intermediate folders are permitted but not necessary and currently contain no extra metadata
Do we need zarr.json
to be shown within the original
directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an ongoing discussion about "implicit" groups. I think the community is leaning towards disallowing these, i.e. requiring zarr.json
files for intermediate folders.
Unchanged in this PR but
Even the example below that text doesn't contain |
This pull request has been mentioned on Image.sc Forum. There might be relevant details there: |
In looking to implement support for reading the proposed V0.5 data (in ome-ngff-validator), I am finding the usage of the versioned key So I find that I am in agreement with various comments on RFC-2 about the concerns of using a version string as a key. |
Instead of using a URL-with-a-version-inside as a key, I think it would be better to pick a name like "ome" or "ome-ngff" as the key for an object, and have a |
I updated this PR for the RFC-2 revision. The namespace key is now |
Working with these schemas and those from @d-v-b's dev1 branch e.g. https://github.com/ome/ngff/blob/7da3d7bbd7c49db29b44e54a6bf5fd7e1387f100/0.5-dev1/schemas/image.schema in the ome-ngff-validator, I noticed that in this PR, the schemas include the I don't know which approach is most useful to the community, given the various tools that might want to consume these schemas? Is it most useful to be able to validate against a whole |
Is there a json schema for the base |
I'm not aware of one, but we should a) make one b) include it with the zarr v3 spec. Were I to work on this today, I would start by fixing up the rather meager v3 support in pydantic-zarr, and then use that to generate the schema. But any way of generating such a schema is valid. |
If part of [[#multiscale-md]], the length of "axes" MUST be equal to the number of dimensions of the arrays that contain the image data. | ||
The "axes" are used as part of [[#multiscale-md]]. The length of "axes" MUST be equal to the number of dimensions of the arrays that contain the image data. | ||
|
||
The "dimension_names" attribute in the `zarr.json` of the Zarr array of a multiscale level MUST match the names in the "axes" metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"dimension_names" are redundant in an OME-Zarr multiscale image, so must they be mandatory? Perhaps this restriction could be relaxed to something like:
If the "dimension_names" attribute is specified in the
zarr.json
of the Zarr array of a multilscale level, it SHOULD match the names in the "axes" metadata.
This will enable arrays with undesirable/non-descriptive/missing dimension names to be used in an OME-Zarr hierarchy without any array metadata changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am open to discuss this. Weakening this restriction could cause conflicts between the array metadata and the OME-Zarr metadata. We would need to define a precedence order.
I wonder what the circumstances would be that you can add the OME-Zarr metadata on the group level, but cannot adjust the array metadata to match the "dimensions_names" attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can always work around this, so it is not essential.
I have legacy multiscale arrays that have been converted from NetCDF that I would prefer to keep immutable and a one-to-one mapping to their source. I want to slap OME-Zarr on top with sensible axes
names (e.g. z
, y
, x
). However, the dimension_names
of the underlying arrays may encode other information and be inconsistent between scales (e.g. segmented_x_1.23_um
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 0.4 spec has many restrictions on the underlying arrays (consecutively numbered groups, the order of dimensions, nested directory layout, etc.) that have since been addressed by this RFC and RFC-3. As far as I can see1, the dimension_names
restriction introduced in this RFC is the only remaining restriction that could make arrays incompatible with OME-Zarr metadata (aside from requiring the length of axes
to match the number of dimensions of the arrays, which is necessary).
Footnotes
-
I've only read the spec for
multiscales
and dependent metadata ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@d-v-b What do you think about this? I believe you advocated for strictly keeping dimension_names
in sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they should be kept in sync, because the alternative is confusing -- how should clients interpret variance between axes
and dimension_names
? And if clients are supposed to just ignore dimension_names
, then why did we add it to the zarr v3 spec in the first place?
…tributes' within a zarr.json
I changed the JSON schema files to use the I also added |
This is the companion PR for RFC-2 which adds the changes to the spec document, json schemata, examples and test files. The PR is meant to support the review process of RFC-2 by providing the specifics.
Again, a brief summary of the main changes:
.zattrs
tozarr.json
ome
key in theattributes
of thezarr.json
files