Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add axes field to multiscale metadata #46

Merged
merged 11 commits into from
Aug 24, 2021
Merged

Conversation

constantinpape
Copy link
Contributor

Add the axes field to multiscales and update latest to v0.2. Pending #44 and release of 0.2 (that's why I marked it as a draft).

The discussion whether axes is mandatory (MUST) or encouraged (SHOULD) is still pending. cc @joshmoore @sbesson @jni

@constantinpape constantinpape marked this pull request as draft April 26, 2021 09:15
Copy link
Member

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few thoughts (sorry if you're still in the process of updating)

latest/index.bs Outdated Show resolved Hide resolved
latest/index.bs Outdated Show resolved Hide resolved
latest/index.bs Show resolved Hide resolved
latest/index.bs Outdated Show resolved Hide resolved
@constantinpape constantinpape marked this pull request as ready for review April 26, 2021 09:54
@constantinpape
Copy link
Contributor Author

I synced the latest changes to 0.2 now and updated all version numbers (I could find) to 0.3 now.

@constantinpape
Copy link
Contributor Author

When migrating #39 I forgot to update the dimensionality requirements, i.e. allowing for image data with less than 5 dimensions.
I added this in the last commit.
@joshmoore let me know if you think we should formulate this more explicitly.

Copy link
Contributor

@thewtex thewtex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 @constantinpape explicit dimensions are helpful to make the dimension identity self-evident and make the format easy to understand and use.

In the interest of compatibility with xarray, we could update the way the axes are encoded with:

        "0/example/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "t",
                "c",
                "z",
                "y",
                "x", 
           ],

per https://xarray.pydata.org/en/stable/internals.html#zarr-encoding-specification

latest/index.bs Outdated
│ └── t.c.z.y.x # a "chunk coordinate", where the maximum coordinate
│ # will be `dimension_size / chunk_size`.
│ └─ t # Chunks are stored with the nested directory layout.
│ └─ c # All but the last chunk element are stored as directories.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we stick with a flat directory? The motivation is compatibility with xarray and avoiding unnecessary overhead with IPFS 👾

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @thewtex. Yes & no.

Short-term

The switch in v0.2 was to let us start actually creating these datasets before all implementations start supporting zarr-developers/zarr-python#715, i.e. ome-zarr-* implementations for v0.2 can "blindly" assert the use of "/" as the dimension separator. Python implementations should be in the (enviable) position that it doesn't matter. If you put the metadata in place, it should work for either assuming your zarr-python version is >=2.8.0 (note: ongoing bug fixes) Does that track with what you are seeing or am I missing something on the xarray front?

Longer-term

Once all of this has propagated through the ecosystem I think this spec shouldn't really care about which array layout is used. So a v0.3 or later can drop the requirement all together. That, however, does not get you want you want in the general case with IPFS. Do you foresee it really being that bad? (I ask because I know that the flat directory is unusable at some scales.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshmoore ah, thanks for the information! I was not aware of zarr-developers/zarr-python#715 .

On the xarray front, they are not tied to a specific zarr spec, as far as I know, so it should cause an issue. But, it seems that we should do the same here, i.e. both layouts, . or /, would be acceptable, as you suggest, correct?

I ask because I know that the flat directory is unusable at some scales.

I can see how this would be the case. In which store types do you experience this?

For IPFS, each entity is identified with a cryptographic-hash. Updating a chunk in a file means updating its identifier, but it also means updating the identifiers for all parent directories in the directory tree; so, deeply nested trees are not desirable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both layouts, . or /, would be acceptable, as you suggest, correct?

That's the plan.

In which store types do you experience this?

DirectoryStore & FSStore

so, deeply nested trees are not desirable.

Understood. Thanks for the info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think is the best way to move forward here, @thewtex @joshmoore?
I would rather not change this now (especially since this change was done in a previous PR already and the changes here are just about syncing the different versions of the spec). Also, like @joshmoore said, not supporting hiearchical storage for the chunks can be a big issue for large datasets on the file-system.

So I would propose that we leave this as it is for now and eventually change the spec once most implementations are agnostic to the actual separator.

If really necessary and there are some implementations that require a flat hierarchy, this could then be added as an extra implementation specific requirement. (I.e. these implementations would not be able to read all ngff formats, but only a subset.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd propose we leave 0.2 as hierarchical, focus 0.3 on axes, and then we can try to remove the 0.2 restriction ASAP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@constantinpape
Copy link
Contributor Author

constantinpape commented May 17, 2021

Sorry, it took me a while to catch up with this. Holidays and then a fairly busy week otherwise.
I have left a comment for the hierarchical chunk storage.

@thewtex

In the interest of compatibility with xarray, we could update the way the axes are encoded with:

        "0/example/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "t",
                "c",
                "z",
                "y",
                "x", 
           ],

I am not so sure that this will work for the OME NGFF format. Potentially there could be multiple datasets which have different axes stored in a single zarr container.
How is this case handled by xarray right now?

Edit: Sorry, I did not think the above comment fully through. After looking at your example again, I realized that we can easily support this if we just require each "scale array" to contain the _ARRAY_DIMENSIONS field, which should be identical with multiscales:axes.
I will add this to this PR.

latest/index.bs Outdated Show resolved Hide resolved
@constantinpape
Copy link
Contributor Author

@joshmoore I think this is good to be merged, but I would suggest we make a 0.2.0 release in the main repository before that.

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/ngff-status-update-may-2021/52918/1

├── 0 # Each multiscale level is stored as a separate Zarr array,
│ ... # which is a folder containing chunk files which compose the array.
├── n # The name of the array is arbitrary with the ordering defined by
│ │ # by the "multiscales" metadata, but is often a sequence starting at 0.
│ │
│ ├── .zarray # All image arrays are 5-dimensional
│ ├── .zarray # All image arrays must be up to 5-dimensional

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshmoore shouldn't this be:

must be 5+ dimensional? OME-Tiff has the 8D / n-D spec and it would be a shame to lose that 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of this PR is to allow for less(!) than 5D, so that it's not necessary to introduce singleton dimensions for 2D/3D/4D data.
There is an ongoing discussion about allowing for more than 5D, see #35. But that is out of scope for the current PR.

Comment on lines +246 to +248
"axes": [
"t", "c", "z", "y", "x"
],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do support n-dimensional, (OME 8D spec / OME n-D spec) then this simply expands to include the dimension shorthands.

@joshmoore
Copy link
Member

@constantinpape, a few points before I disappear on holidays:

Copy link
Member

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few last minute items to go on top of #50

latest/index.bs Outdated
@@ -2,8 +2,7 @@
Title: Next-generation file formats (NGFF)
Shortname: ome-ngff
Level: 1
Status: LS-COMMIT
Status: w3c/ED
Status: w3c/CG-FINAL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you likely want to revert these top-matter items

latest/index.bs Outdated
@@ -18,12 +17,12 @@ Editor: Sébastien Besson, Open Microscopy Environment (OME) https://www.openmic
Abstract: This document contains next-generation file format (NGFF)
Abstract: specifications for storing bioimaging data in the cloud.
Abstract: All specifications are submitted to the https://image.sc community for review.
Status Text: The current released version of this specification is
Status Text: <a href="../0.1/index.html">0.1</a>. Migration scripts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This text should stay as is but point to the 0.3 that you will create when latest/index.bs is done.

@@ -551,6 +597,11 @@ Version History {#history}
<td>Description</td>
</tr>
</thead>
<tr>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus add in your own changelog.

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/ome-zarr-napari/55435/1

@joshmoore
Copy link
Member

I've dealt with merge conflicts and handled the "publication steps" (such as they are) if anyone would like to take a look.

@joshmoore joshmoore self-assigned this Aug 24, 2021
@joshmoore joshmoore requested a review from sbesson August 24, 2021 10:42
@joshmoore
Copy link
Member

If no objections, I'll merge this and move forward with https://github.com/orgs/ome/projects/17?add_cards_query=is%3Aopen

@will-moore
Copy link
Member

Sorry, just a thought....

Do we want limit the last two axes to be ["y", "x"] (at least 2D)?
As I understand it, the idea of v0.3 was to allow less than 5D, but as it stands, these are all valid:

["c"],
["x", "y"],  # inverted
["c", "y", "t", "x"],

But do we want all clients that support the spec to also support all of those?

@will-moore
Copy link
Member

Ah - sorry, I missed the existing caveat # with dimension order (t, c, z, y, x). which hasn't been changed by this PR.
It's just that this isn't mentioned in the list of axes rules in the text.
And there's still no 2D minimum - but maybe that's OK since we're all dealing with Images.
So I don't think this is a blocker.
Apologies for the noise!

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new version of the specification has already been tested in various read and write scenarios across different implementations including omero-cli-zarr, vizarr, MoBIE. The specification changes look good so I am in favor of getting this published and completing the various tasks associated with OME-NGFF 0.3.

@joshmoore joshmoore merged commit 2986bbe into ome:main Aug 24, 2021
github-actions bot added a commit that referenced this pull request Aug 24, 2021
Add axes field to multiscale metadata

SHA: 2986bbe
Reason: push, by @joshmoore

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@jni
Copy link

jni commented Aug 29, 2021

Just spinning back up after my hiking-trip-turned-lockdown-staycation and I'm so delighted to find this merged! 🎉 Thank you SO much to everyone for your patience and perseverance with this! ❤️‍🔥 (◀️ just discovered that emoji and it's perfect!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants