Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate axes types v0.4 #124

Merged
merged 17 commits into from
Jan 18, 2022
Merged

Conversation

will-moore
Copy link
Member

@will-moore will-moore commented Oct 28, 2021

This builds on top of #123 to add validation of new axes dicts, based on the axes types.

Based on spec at ome/ngff#57 (not merged yet).

There is a fair bit of validation code here cc @glyg

Breaking changes:

  • reader axes now returns a list of dicts instead of list of str
  • OME-Zarr data is written as v0.4

But the writing API isn't breaking. Anything that was valid for v0.3 is also valid for v0.4, since we automatically add type info for tczyx dimensions.

@codecov
Copy link

codecov bot commented Oct 28, 2021

Codecov Report

Merging #124 (d9a44f7) into master (2ed4426) will increase coverage by 1.04%.
The diff coverage is 96.36%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #124      +/-   ##
==========================================
+ Coverage   80.49%   81.53%   +1.04%     
==========================================
  Files          11       12       +1     
  Lines        1174     1251      +77     
==========================================
+ Hits          945     1020      +75     
- Misses        229      231       +2     
Impacted Files Coverage Δ
ome_zarr/format.py 93.97% <90.90%> (-0.55%) ⬇️
ome_zarr/axes.py 95.52% <95.52%> (ø)
ome_zarr/reader.py 84.18% <100.00%> (+0.52%) ⬆️
ome_zarr/writer.py 93.91% <100.00%> (+0.47%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2ed4426...d9a44f7. Read the comment docs.

@glyg
Copy link
Contributor

glyg commented Oct 28, 2021

Hi @will-moore sorry I'm not more responsive - I'm out of work till Tuesday, will dig in all that then!

@will-moore
Copy link
Member Author

@glyg No problem. No hurry - just pinging you in case it helps avoid any duplicate effort.

ome_zarr/writer.py Outdated Show resolved Hide resolved
ome_zarr/writer.py Outdated Show resolved Hide resolved
ome_zarr/reader.py Outdated Show resolved Hide resolved
ome_zarr/axes.py Outdated Show resolved Hide resolved
ome_zarr/axes.py Outdated Show resolved Hide resolved
axes = multiscales[0].get("axes")
fmt = format_from_version(version)
# Raises ValueError if not valid
validate_axes(None, axes, fmt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are not consuming the return values of validate_axes, the goal here is "only" to validate an axes but not modify it ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - good point. Looking at this again, I realise that validate_axes(None, axes, fmt) is really designed for writing. So, what's returned will be valid, even if what it's passed isn't valid. E.g. it will convert "tczyx" to an axis array, and will allow v0.3 to have None if 2D or 5D.
What we really want here is Axes(axes, fmt).validate()

axes = tuple(multiscales[0].get("axes", ["t", "c", "z", "y", "x"]))
if len(set(axes) - axes_values) > 0:
raise RuntimeError(f"Invalid axes names: {set(axes) - axes_values}")
axes = multiscales[0].get("axes")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the main implication of not setting a default value here?
For 0.1/0.2 data, this means, the node.metadata["axes"] might be None as opposed to ["t", "c", "z", "y", "x"] previously i.e. we are preserving the value stored in the metadata? Is there an impact on clients relying on node.metadata["axes"]?

ndim: int, axes: Union[str, List[str]] = None, fmt: Format = CurrentFormat()
) -> Union[None, List[str]]:
"""Returns validated list of axes names or raise exception if invalid"""
def validate_axes(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least I find the following lines

axes_obj = Axes(axes)
axes_obj.validate(fmt)

clarifies a lot of the logic happening here. It brings the question of whether additional logic should be moved to the constructor. Said otherwise, what is the added value of calling the validate_axes API vs the two-liner:

axes = Axes(axes, fmt=fmt, ndim=ddim)
axes.validate()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the outcome of #124 (comment). Should the name of the method be updated to reflect this is a writer/constructor rather than a validator? Since this API is moved to be a public API, this is increasingly important. Alternatively, we can keep it prefixed with _ for now.

ome_zarr/reader.py Outdated Show resolved Hide resolved
ome_zarr/axes.py Outdated Show resolved Hide resolved
Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few additional comments mostly revolving around API names and behaviors. But I think the bulk of the work looks good

As discussed this morning, we will need to get this consumed by downstream clients e.g. omero-cli-zarr and/or napari-ome-zarr. To respect the fact the 0.4 specification is not yet published, I don't think this can land into a public release of ome-zarr-py. Instead, can we try to make the minimal decision to target a 0.3 pre-release?

ome_zarr/writer.py Show resolved Hide resolved
ndim: int, axes: Union[str, List[str]] = None, fmt: Format = CurrentFormat()
) -> Union[None, List[str]]:
"""Returns validated list of axes names or raise exception if invalid"""
def validate_axes(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the outcome of #124 (comment). Should the name of the method be updated to reflect this is a writer/constructor rather than a validator? Since this API is moved to be a public API, this is increasingly important. Alternatively, we can keep it prefixed with _ for now.

ome_zarr/reader.py Outdated Show resolved Hide resolved
Copy link
Member

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. One minor detail question below and one higher-level here: Does anyone have a feeling for when we might want to start adding submodules, e.g. are we approaching a "util" or "internals" with his PR?

axes_names.append(axis["name"])
return axes_names

def _validate_03(self) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to read more here to get my head around the new class, but my immediate reaction to methods that are version specific is that I'd hope they could live in the Format object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hindsight. At some point, I was also wondering whether Axes03 Axes04 should be introduced to toggle between behaviors. It would be interesting to see how to make use the Format API to handle this as this can be re-used in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to think this through... It's possible for everything to become version specific if it changes in a future version, and we wouldn't want the API to change when methods move to a Format object.
Everything in the Axes class is version specific, (e.g. _validate_axes_types will soon become more liberal). So do we move those internal methods to Format objects when they change?
This will mean that eventually all the code for everything will migrate Format objects.
Are the methods of Format objects (e.g. validate_axes()) expected to be part of the external API?
What's the use-case or benefit of all version-specific code being in Format objects?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to think of another example with the 0.4 HCS changes to the wells element discussed in #157, I could certainly imagine migrating _validate_well_images(wells) by fmt.validate_well_images(wells) or similar and let FormatV04 vs FormatV{01,02,03} override the methods as appropriate.

@sbesson
Copy link
Member

sbesson commented Jan 18, 2022

As discussed yesterday, merging to move forward towards a pre-release of ome-zarr-py 0.3.0 and we can capture some of the outstanding comments as issues.

@sbesson sbesson merged commit 73aee67 into ome:master Jan 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants