Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

versioning data for v1.1 with referrers API #365

Closed
mikebrow opened this issue Nov 16, 2022 · 22 comments
Closed

versioning data for v1.1 with referrers API #365

mikebrow opened this issue Nov 16, 2022 · 22 comments
Labels
enhancement New feature or request
Milestone

Comments

@mikebrow
Copy link
Member

The current direction with respect to handling when the referrers API is not supported by the registry needs work..

The diff for 1.1 for referrers states: When pushing an image or artifact manifest with the subject field and the referrers API returns a 404, the client MUST

This results in clients having to request referrers after pushing one of the new manifest types, to make a version/capability determination, or a client would have to have a master list based on registry (and repository?).

What is needed is a version/capabilities check preferably the version of OCI supported should be easily discovered and/or the capabilities should be made available such that a call to referrers is not needed after each push of a new manifest.

@dmcgowan @sudo-bmitch

@mikebrow mikebrow added the enhancement New feature or request label Nov 16, 2022
@mikebrow mikebrow added this to the v1.1.0 milestone Nov 16, 2022
@mikebrow
Copy link
Member Author

one option would be an extension

@sudo-bmitch
Copy link
Contributor

sudo-bmitch commented Nov 16, 2022

What's the advantage of an extension or version check over checking if the API works? One efficiency I believe ORAS is looking at is a dummy digest (all 0s) that will always return an empty response (unless someone manages a hash collision).

@mikebrow
Copy link
Member Author

mikebrow commented Nov 17, 2022

One is explicit, the other is a subjective/reactive response. When the registry oci version is 1.0 and referrers returns 4xx that is the expected response a non 4xx would be an error. When the registry oci version is 1.1 and referrers still returns 4xx it is not expected in this case for registries that claim support, but because the api is only a "SHOULD" it is not mandatory to succeed. When the version of the registry api is docker 1 or 2 and no oci support exists.. what procedure then?

Suggest, not solving the versioning issue with something more explicit will get trickier as we move ahead.

Agree a nil/dummy referrers request with an empty response as a positive ack to supporting referrers is better than asking for a set of expected manifests.. just to check if referrers is supported at all. But it still feels like kicking the versioning can down the git branch tree :-)

@sudo-bmitch
Copy link
Contributor

sudo-bmitch commented Nov 17, 2022

We included the following to enable discovery:

If the registry supports the referrers API, the registry MUST NOT return a 404 Not Found to a referrers API requests.

A registry without the API will respond with a 404 to these requests, or perhaps a 400 if something is broken. In both cases, the client should fall back.

We avoided a check on the registry version because it felt easier and less error prone to just check the API you want to use. Less error prone because we worried some registries would claim 1.1 compatibility without the API, or perhaps they are a v1 registry with some v1.1 features. And easier because it's one less API to define and support.

@mikebrow
Copy link
Member Author

The first time you try to use a new api that only has one version you can argue existence is proof.. When there are two versions .. that argument looses strength. Forward and backward compatibility of service apis is easier to do when you know what version you are using.

In this case (it seems to me?) the text is arguing assume OCI 1.1 format for GET/PUSH but if referrers is called and 404 not found is returned manually build a 1.1 image index tagged with the 1.1 referrers tag schema and push that to the registry.

The client if afforded the option may wish to not push the 1.1 artifacts or try to call referrers if the registry is known to not support the entire 1.1 specification, and instead tell the user 1.1 artifacts support (including 1.1 referrers requests are not supported) would you like to use the fallback pattern and push the following image index.. or if it does support 1.1 the client could ask/suggest the user to provide an image index tagged with the referrers tag schema because, for now this registry does not appear to support the referrers api and it would be beneficial to have a tagged list for later retrieval.

Alternatively a client may wish to use the OCI 1.0 artifact pattern if the version of the registry is known to not support OCI 1.1 format or OCI 1.1's new referrers api.

Dunno, for me once there are two versions of an API, the code decisions sort of begin with ok what version of the API are we using.

@mikebrow
Copy link
Member Author

mikebrow commented Nov 17, 2022

The version issue of the manual/auto created image index with the 1.1 referrers tag schema, and how a client should serialize against other clients trying to update the same image index points to another pro/con of having a version, imagine if there was no tag to indicate version and now try to serialize just on digest.

Another issue... What if the registry "upgrades" or "downgrades" support for 1.1/referrers, what if a first client supports referrers and a second client does not. What if a "mirror" supports referrers but the source does not, or vice versa..

@sajayantony
Copy link
Member

Irrespective of referrers, caps/version would be a good addition to distribution. Maybe a SHOULD in 1.1 and could be slowly moved upto a Must in the next revision.

@mikebrow
Copy link
Member Author

soln might be to require both modes with/without referrers pattern on artifact push for 1.1 with a deprecation of the without pattern

@sudo-bmitch
Copy link
Contributor

Some of the comments from today's call:

  • Clients that want to change their behavior before pushing a manifest can check for the referrers API support before, rather than after, pushing the manifest with a subject field.
  • The OCI version doesn't indicate all API's are supported by a given registry. E.g. a registry may be partially OCI v1 conformant and listed as a v1 registry without implementing the tag listing or manifest push APIs.
  • There is some interest in having a capabilities API.
  • A downgrade with a pull through cache will result in degraded behavior for downstream clients.

Re the version/capabilities API: personally I wouldn't use it, and instead query the API directly and handling the errors. The risk is the registry may report different capabilities that what actually works with the API. Perhaps the API is throwing 4xx errors because of a broken implementation, or perhaps the API is enabled before updating the capabilities API. If the registry responds with different capabilities than what is actually implemented, then clients will make mistakes, resulting in data loss (not pushing the fall back tag) or unnecessary tags (cluttering the listing when the API works).

Re the pull through cache: this only affects pulls (pushes go to the upstream registry) so it won't cause consistency issues upstream. A potential workaround is for registries to convert a request for a fallback tag to a referrers API request. I don't know that we want to put that in the spec, but I wouldn't be opposed to a registry supporting that with a backwards compatibility flag.

For managed mirrors (a full registry that happens to have a copy of images from another source), the tooling copying the images can automatically handle the upgrade and downgrade, pushing/pulling the fall back tag when the referrers API isn't available.

@sudo-bmitch
Copy link
Contributor

One item discussed in today's call was the need to fail fast on the client side, before authenticating or pushing blobs, when a manifest media type wouldn't be supported on the registry. Registries may still accept a media type according to a capabilities API, and later reject it after the blob was pushed, if that registry is doing additional filtering that can't be communicated in a capabilities API (e.g. rejecting unknown fields, or finer grain validation on the manifest content).

The part I want to avoid is automatically upgrading a client to new functionality at runtime, because an automatic upgrade creates significant portability issues for content. If that was done, once users upgrade the origin registry, all other registries where the manifest may be copied to would also need to be upgraded. It's very common for the reverse to be the reality, the development/build environment registry is upgraded before the production/public facing registry. Perhaps any capabilities API should come with documentation to warn implementations away from generating non-portable content.

@sajayantony
Copy link
Member

Using the current set of APIs without a deterministic way to determine the version of registry it has become quite hard to determine if we should use ImageManifest+Index or Artifact+Index or Artifact+Referrers

The question for the maintainers I have is - Are you comfortable to release distribution spec 1.1 without resolving this issue or should this be defined as a part of the https://github.com/opencontainers/distribution-spec/milestone/6

Sharing @toddysm's write up here - https://toddysm.com/2023/01/05/oci-artifct-manifests-oci-referrers-api-and-their-support-across-registries-part-1/

@sudo-bmitch
Copy link
Contributor

How should this be used by clients? If clients query the capabilities API and the capabilities API itself is not available, do they still try to access a feature anyway, or do they assume a registry that doesn't implement the capabilities API hasn't implemented any of the features it would describe? In other words, does this break registries that added the subject/referrers functionality before the capabilities API was defined? And if a registry does implement a capabilities API, how does this impact client error handling if a registry claims to support something but still rejects it later? Something I'd like to avoid is multiple tiers of error handling for the same error, because it introduces the risk of inconsistency for clients.

Also, how long are we comfortable delaying the 1.1 release to define, build, test, and approve a new capabilities API?

@sajayantony
Copy link
Member

Before Capabilities or header etc. is there interest in specifying the types so that when you release distribution 1.1 ?
main...sajayantony:distribution-spec:supported-types

I can make this a PR if I folks are ok but this is orthogonal to the caps/header discussion.

@sudo-bmitch @jdolitsky @jonjohnsonjr

@jlbutler
Copy link
Member

As I mentioned in last week's call, I think our conversations are potentially conflating more than one concern. Personally I've very open to further discussion, but I think if we continue to pull these into the same frame, we run a risk of not making much progress.

For the sake of seeing some quick-take upvotes or downvotes, I'm going to make two subsequent comments.

@jlbutler
Copy link
Member

  1. Provide a mechanism for clients to determine the functionality of a registry they are talking to

To this, we've had a couple of proposals. As this is the first minor version update of the distribution spec (or, any of the specs), it seems like a good time to add something. I know we were leaning toward capabilities vs version, but honestly I think version is simpler and does the job.

While a registry could report 1.0 and also accept Artifacts, or host a referrers endpoint without claiming version 1.1, registries that report 1.1 MUST support all features. This isn't compliance, but that the implementation should be complete. Then, clients decide how to work with that, but it gives the happy path for clients that want to know if 1.1 (including referrers and Artifacts) is supported or not.

In regards to how long we'd delay a release for this, I don't think that adding a version endpoint to the spec as it is would be a significant undertaking. As I offered to write a PR to do that, I'm still good to do that. But I would like some sense of community direction on this issue.

How are folks thinking about this? Yay, nay, needs more discussion?

@jlbutler
Copy link
Member

  1. Provide clear guidance to client implementors such that they can make implementation decisions related to feature support and legacy storage

In most specs that I'm aware of that involve a client and server which have the server storing artifacts, concerns around portability relate to existing artifacts being able to move forward into the future and not be stranded. I've never seen a spec take into consideration storing future-version artifacts on a down-rev storage system.

I would really like other folks to chime in here - my experience with multi-version specs isn't really in the cloud native apps space, but in storage protocols and filesystems.

All this said, there is no guarantee nor any requirement that after N number of weeks, months, or years that all registries will support all 1.1 features. Therefore if we don't choose to move on from this concern of up-rev artifacts being stored in down-rev registries, the only solution seems to be to not rev the spec meaningfully and never really adopt Artifacts.

How are we feeling about this being addressed narratively with guidance, maybe in Use cases or even a new section related to versioning (which of course doesn't exist yet). Thumbs up or down here would also be appreciated.

@imjasonh
Copy link
Member

How are we feeling about this being addressed narratively with guidance, maybe in Use cases or even a new section related to versioning (which of course doesn't exist yet). Thumbs up or down here would also be appreciated.

+1, more narrative guidance is always helpful, and existing outside the spec gives it more flexibility to improve examples and wording without the usual difficulty of spec language.

@sudo-bmitch
Copy link
Contributor

All this said, there is no guarantee nor any requirement that after N number of weeks, months, or years that all registries will support all 1.1 features. Therefore if we don't choose to move on from this concern of up-rev artifacts being stored in down-rev registries, the only solution seems to be to not rev the spec meaningfully and never really adopt Artifacts.

In a lot of other projects I've worked with, there's a concept of a grace period to upgrade, and a support window that everyone can depend on. With the forced upgrade approach, we're saying as soon as the initial registry is upgraded, all downstream registries are no longer supported if they didn't already upgrade. It's a very user hostile approach that concerns me.

My own approach was going to be a wait and see how adoption goes, and once enough key players transitioned, and self hosted users had time to upgrade, I would change the default. And that would only be a default, users could override that either way. But if there's a concern that no one will support the artifact manifest, a fixed time after the GA release would also make sense.

With the questions raised in this issue, my own questions above haven't been addressed. I'm not comfortable moving forward with a new feature without knowing exactly how we are recommending that feature be used. Most of the spec defines, for each API, how clients call it, and how it is used in a workflow (e.g. a blob push runs before a manifest push).

In this case, we are trying to create a new API without specifying how and why it's needed, and that feels backwards to me. I'd rather define the issue first, work through the possible solutions, pick the solution we like the best, and then define the API that's needed for that solution.

@jlbutler
Copy link
Member

Totally agree on 'forced upgrade' @sudo-bmitch, your concerns have convinced me. I was more focused on just setting context here so we can unblock from definitions and discuss a solution. I probably should have been more complete.

By 'forced upgrade', we're talking about clients' automatic adoption of new features present in a registry without user specification. In some contexts we refer to this as upgrade, which is a slightly confusing term, at least for me.

I think these use cases are primarily publisher context, that is, clients creating artifacts. There could be implications for consumers as well, we should be sure to include them if any.

Focusing on publisher for the moment, adoption of new features by clients will play out in at least three different ways. These three are based on clients that I'm aware of, and my understanding of their plans to adopt 1.1+ features.

  • In one client, a switch will be added so that users creating new artifacts will explicitly choose when to use 1.0 style workflows, and when to use 1.1+ workflows. They have chosen to default to 1.1+, and this will be implemented to fail for a 1.0-ish registry and the user can then specify a --artifact-type legacy (my wording) option to fallback. This I think addresses the "surprise, new artifact type!" aspect of concern.

  • In a second client, they are planning to default to 1.1+ and automatically fallback if 1.1 style Artifacts are not supported. This could possibly cause breakages in a toolchain (e.g. "surprise, new artifact type!" and a downstream tool doesn't yet support 1.1+), but this is their choice and their documentation will hopefully make that consideration clear.

  • A third client is planning on only supporting 1.1+ registries, and not having any legacy support at all. This significantly limits their scope, but they are ok with that and their docs will also make that clear. This isn't a one-way door and they can always adopt a fallback later, if they choose.

So I believe the thing we're most concerned about here is whether or not a user acting as publisher specifies, or at least implies, the artifact type (and subsequent use of fallback tag schema as proposed), or if it is automatic.

As a precedent, I think we need to consider this for future features as well. Just as with a capabilities endpoint, we've not had to consider this sort of thing before. The first time we're introducing a new minor version seems as good a time as any to do it.

There are maybe more use cases, but does this help clarify? Cause more concern?

If these are all laid out in use cases or elsewhere, would that address concerns from a spec author point of view? Or would we need to add opinionated guidance to accompany these?

@sudo-bmitch
Copy link
Contributor

For context, we did include an upgrade path in Proposal E: https://github.com/opencontainers/wg-reference-types/blob/main/docs/proposals/PROPOSAL_E.md

@jonjohnsonjr
Copy link
Contributor

This results in clients having to request referrers after pushing one of the new manifest types, to make a version/capability determination, or a client would have to have a master list based on registry (and repository?).

I believe #379 solves this concern.

@jdolitsky
Copy link
Member

solved in #379

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants