Skip to content
This repository has been archived by the owner on May 15, 2024. It is now read-only.

Update referrers API to return a list of descriptors #3

Merged
merged 3 commits into from
Aug 13, 2021

Conversation

michaelb990
Copy link
Contributor

Fixes #2 .


The `referrers` API returns all artifacts that have a `subjectManifest` to given manifest digest. Referenced artifact requests are scoped to a repository, ensuring access rights for the repository can be used as authorization for the referenced artifacts.

Artifact references are defined in the [oras.artifact.manifest spec][oras.artifact.manifest-spec] through the [`subjectManifest`][oras.artifact.manifest-spec-manifests] property.

## Request All Artifact References

The referrers api is sits alongside the [distribution-spec][oci-distribution-spec] paths avoiding any conflict with existing or new distribution apis. Pathing within the referrers api provides consistent repo/namespace paths, enabling registry operators to implement consistent auth access, using existing tokens for content.
The referrers api is sits alongside the [distribution-spec][oci-distribution-spec] paths avoiding any conflict with existing or new distribution apis. Pathing within the referrers api provides consistent repo/namespace paths, enabling registry operators to implement consistent auth access, using existing tokens for content.
Copy link

@mnltejaswini mnltejaswini Aug 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to call out on how to include nextLink in the request parameters? as a query string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would prefer that nextLink actually be a nextToken. I don't like APIs returning a link that clients will (possibly) follow blindly. I was going to propose a nextToken which could be passed into a subsequent call to the API via a query parameter, most likely ?nextToken={nextToken}. Thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a bit weird to invent a new pagination mechanism. Can you do the same thing that we do for tag listing?

https://github.com/opencontainers/distribution-spec/blob/ef28f81727c3b5e98ab941ae050098ea664c0960/detail.md#tags-paginated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally agree 100% and did not know about the pagination that existed for the tags API. Will open a new issue (#4).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't think its weird to have a new pagination listing that does support richer capabilities and yes +1 on nextToken. The distribution spec makes it harder for clients to follow and also makes it harder for a data provider to follow pagination downstream. ACR had quite a few issues and not totally not a fan. This is not under /v2 so does not need to follow that pagination mode.

@jonjohnsonjr - does GCR actually implement the tag listing - google/go-containerregistry#897 (comment)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tags listing is based on lexical ordering with next and last, I am not sure what ordering these reference artifacts will have so as to follow that pattern.

@michaelb990 Yes, nextToken or skipToken is commonly used as continuation tokens that are sent via query string.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does GCR actually implement the tag listing

GCR doesn't implement tag listing because it returns the entire dataset, but it does implement pagination for catalog using the same pagination mechanism.

This is not under /v2 so does not need to follow that pagination mode.

I think that's a bit myopic if we want this to eventually get integrated into the spec. I don't really want to have to implement two different kinds of pagination within a client.

Tags listing is based on lexical ordering with next and last, I am not sure what ordering these reference artifacts will have so as to follow that pattern.

You could still use the Link header, as existing clients already know to follow the Link header as per https://docs.docker.com/registry/spec/api/#pagination

Compliant client implementations should always use the Link header value when proceeding through results linearly. The client may construct URLs to skip forward in the catalog.
...
The above process should then be repeated until the Link header is no longer set.

I'd be fine with not supporting lexical ordering (or really, any total ordering) as long as whatever pagination mechanism used here sets an appropriate Link header such that I can just follow it. You can put whatever tokens you want in the Link url.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate this discussion, but let's move it to #4. I didn't intend to change pagination with this PR, and tried to focus my changes on the format for the items in the list of references. Sounds like it definitely deserves more discussion on that issue before we make a PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonjohnsonjr I see your point of Link header and sounds like we can align . I think we can improve upon this and support returning a skipToken/nextToken in the Link header.

Link: <<url>?n=2&skipToken; rel="next"

GET /v2/<name>/tags/list?n=<n from the request>&nextToken=<>

Regarding ordering - Lexical doesn't make sense so it's worth calling out here. I would however lean towards reverse chronological since you typically want the newest first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate this discussion, but let's move it to #4. I didn't intend to change pagination with this PR, and tried to focus my changes on the format for the items in the list of references. Sounds like it definitely deserves more discussion on that issue before we make a PR.

+1 for splitting paging and the response type.

@@ -1,14 +1,14 @@
# Manifest Referrers API

[ORAS Artifact-manifest](./artifact-manifest.md) provides the ability to reference artifacts to existing artifacts. Reference artifacts include Notary v2 signatures, SBoMs and many other types. Artifacts that reference other artifacts SHOULD NOT be tagged, as they are considered enhancements to the artifacts they reference. To discover referenced artifacts a manifest referrers API is provided. An artifact client, such as a Notary v2 client would parse the returned manifests, determining which manifest type they will pull and process.
[ORAS Artifact-manifest](./artifact-manifest.md) provides the ability to reference artifacts to existing artifacts. Reference artifacts include Notary v2 signatures, SBoMs and many other types. Artifacts that reference other artifacts SHOULD NOT be tagged, as they are considered enhancements to the artifacts they reference. To discover referenced artifacts a manifest referrers API is provided. An artifact client, such as a Notary v2 client would parse the returned manifest descriptors, determining which manifest type they will pull and process.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelb990 we have been discussing about the pros and cons of returning manifest versus manifest descriptors. The data points that we were having are

  1. Clients are recommended to validate the data returned from server. This applies to the “artifactType” property that is returned in the response. If clients has to assure this value by querying the manifest, it would be cost effective if the original manifest is returned as part of response.
  2. Most flexibility for client-side filteration. Also, annotations being free form might complex the registry implementations to support indexing or filtering on them. The registry server might have to process the manifest to extract these annotations or other filterable properties. If so, it would be beneficial to return the entire manifest. This will extend the client side filtering beyond annotations of just the manifest but also on any of the enclosed blobs.
  3. This option doesn't help with the response size problem to the full extent. The E2E scenarios call out filtering based on both “artifactType” and annotations. The cost of returning annotations versus the entire manifest is same on the registry server. However, the latter option has a benefit of optimizing the network roundtrips on clients.

Therefore, we are leaning towards returning the manifest as part of the list result and we have a spec for our proposal.
https://gist.github.com/aviral26/ca4b0c1989fd978e74be75cbf3f3ea92 . Can you please let us know your thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the best of both worlds is to return a list of descriptors with the data inlined: opencontainers/image-spec#826

Copy link
Contributor Author

@michaelb990 michaelb990 Aug 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be open to a list of descriptors and a second call to get the manifest OR a list of descriptors that use the data field to inline the manifest (maybe optionally based on the size of the manifest, but that might just get complicated). I would prefer not introducing a new descriptor-like object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently data has not finalized and we can consider it when #826 is settled and we can assume that manifest descriptors can be query from the /manifests api and/or /blobs

We should be free to decide on another property (e.g. palyoad) if needed. Moving to ORAS was specifically to not block on OCI but generally work with the intention of moving things back to OCI.

I think the item we discussed about returning annotations indexing content from the manifest. When content is encrypted with customer keys hoisting content from customers manifest into an indexed set was challenging and that is why we were considering returning the full encoded manifest inside a property upto a max response size of say 10MB. From a specification perspective - this is optional and the client will follow the digest to retrieve the manifest if it is not present.
Also we didn't want to use data primarily due to the OCI specification ambiguity.

Lastly regarding descriptor - Since the descriptor doesn't enforce list of properties it will be hard to implement strict client libraries if the server returns fields that they is not understood. I am actually leaning towards explicitly calling out the fields that will be returned in this response so that clients aren't expected to ignore fields as per descriptor specifications.
https://github.com/opencontainers/image-spec/blob/main/descriptor.md#reserved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we can assume that manifest descriptors can be query from the /manifests api and/or /blobs

I don't follow this. Is that related to opencontainers/image-spec#826 or non-sequitur?

I am actually leaning towards explicitly calling out the fields that will be returned in this response so that clients aren't expected to ignore fields as per descriptor specifications.

I'm confused about this. You're essentially freezing the spec indefinitely, and it makes upgrades almost impossible. I may be biased here because most of my professional career has revolved around protobufs, but ignoring unknown fields seems like a necessity if you have any kind of protocol that you hope to evolve: https://developers.google.com/protocol-buffers/docs/proto#updating

Any new fields that you add should be optional or repeated. This means that any messages serialized by code using your "old" message format can be parsed by your new generated code, as they won't be missing any required elements. You should set up sensible default values for these elements so that new code can properly interact with messages generated by old code. Similarly, messages created by your new code can be parsed by your old code: old binaries simply ignore the new field when parsing. However, the unknown fields are not discarded, and if the message is later serialized, the unknown fields are serialized along with it – so if the message is passed on to new code, the new fields are still available.

Copy link
Contributor Author

@michaelb990 michaelb990 Aug 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently data has not finalized and we can consider it when #826 is settled

I think that the efforts to define the data field in a manifest in opencontainers/image-spec#826 doesn't have to block us from using it in a different context, specifically in the list of references for a manifest.

If we made the referrers API a list of descriptors that included the data field (and defined it to mean the base64 encoded manifest), then we would get effectively what you're proposing, but be able to re-use an OCI object instead of having to invent a new one. It would look like this:

{
  "references": [
    {
      "digest": "sha256:3c3a4604a545cdc127456d94e421cd355bca5b528f4a9c1905b15da2eb4a4c6b",
      "mediaType": "application/vnd.oci.artifact.manifest.v1+json",
      "size": 312,
      "data": "ICAgIHsKICAgICAgZGlnZXN0OiBzaGEyNTY6M2MzYTQ2MDRhNTQ1Y2RjMTI3NDU2ZDk0ZTQyMWNkMzU1YmNhNWI1MjhmNGE5YzE5MDViMTVkYTJlYjRhNGM2YiwKICAgICAgbWVkaWFUeXBlOiBhcHBsaWNhdGlvbi92bmQub2NpLmFydGlmYWN0Lm1hbmlmZXN0LnYxK2pzb24sCiAgICAgIGFydGlmYWN0VHlwZTogY25jZi5ub3RhcnkudjIsCiAgICAgIHNpemU6IDMxMiwKICAgIH0sCg=="
    },
    {
      "digest": "sha256:3c3a4604a545cdc127456d94e421cd355bca5b528f4a9c1905b15da2eb4a4c6b",
      "mediaType": "application/vnd.oci.artifact.manifest.v1+json",
      "size": 237,
      "data": "ICAgIHsKICAgICAgZGlnZXN0OiBzaGEyNTY6M2MzYTQ2MDRhNTQ1Y2RjMTI3NDU2ZDk0ZTQyMWNkMzU1YmNhNWI1MjhmNGE5YzE5MDViMTVkYTJlYjRhNGM2YiwKICAgICAgbWVkaWFUeXBlOiBhcHBsaWNhdGlvbi92bmQub2NpLmFydGlmYWN0Lm1hbmlmZXN0LnYxK2pzb24sCiAgICAgIGFydGlmYWN0VHlwZTogZXhhbXBsZS5zYm9tLnYwLAogICAgICBzaXplOiAyMzcsCiAgICAgIGRhdGE6IGFHVnNiRzg9CiAgICB9Cg=="
    }
  ],
  "@nextLink": "{opaqueUrl}"
}

This would also match exactly with the existing descriptor and not require defining which annotations get up-lifted OR adding a new artifactType field.

and we can assume that manifest descriptors can be query from the /manifests api and/or /blobs

Descriptors returned by the references API will always be for manifests.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, "data" field is optional, constrained to size limits set by registry?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently data has not finalized

I think the semantics are pretty settled and unlikely to change (we have had a quorum to merge, barring some wordsmithing in the considerations). The discussion is mostly centered around how registry operators feel about the field, which hopefully we can resolve via opencontainers/distribution-spec#293

Given that the semantics here are identical to the proposal, I think it's fine (and good) to use data in this way.

I think that the efforts to define the data field in a manifest in opencontainers/image-spec#826 doesn't have to block us from using it in a different context, specifically in the list of references for a manifest.

That PR isn't about defining the data field in a manifest, it's about defining the data field for any descriptor in any context. Your use here is precisely one of the intended use cases.

Just to clarify, "data" field is optional, constrained to size limits set by registry?

That's my understanding. Clients should be prepared to fetch any content in the response via subsequent requests, but if the registry chooses to inline the associated content for a descriptor in the data field, the client can save the roundtrip.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the efforts to define the data field in a manifest in opencontainers/image-spec#826 doesn't have to block us from using it in a different context, specifically in the list of references for a manifest.

This context of using data as a response type makes more sense, as it's not a required duplication of content persisted in the registry, rather a payload optimization.

If we made the referrers API a list of descriptors that included the data field (and defined it to mean the base64 encoded manifest), then we would get effectively what you're proposing, but be able to re-use an OCI object instead of having to invent a new one. It would look like this:

I generally like this. The one concern is how do we account for artifactType filtering?
If the artifactType is embedded in the OPTIONAL data element, and the data element is empty as it exceeded the max capacity, how does a client know to filter cncf.notary.v2 signatures, vs spdx.sbom or other artifactTypes?

I'm thinking we should include the artifactType in the descriptor so it's always available with/without the data element being populated.

{
  "references": [
    {
      "digest": "sha256:3c3a4604a545cdc127456d94e421cd355bca5b528f4a9c1905b15da2eb4a4c6b",
      "mediaType": "application/vnd.oci.artifact.manifest.v1+json",
      "size": 312,
      "artifactType": "cncf.notary.v2",
      "data": "ICAgIHsKICAgICAgZGlnZXN0OiBzaGEyNTY6M2MzYTQ2MDRhNTQ1Y2RjMTI3NDU2ZDk0ZTQyMWNkMzU1YmNhNWI1MjhmNGE5YzE5MDViMTVkYTJlYjRhNGM2YiwKICAgICAgbWVkaWFUeXBlOiBhcHBsaWNhdGlvbi92bmQub2NpLmFydGlmYWN0Lm1hbmlmZXN0LnYxK2pzb24sCiAgICAgIGFydGlmYWN0VHlwZTogY25jZi5ub3RhcnkudjIsCiAgICAgIHNpemU6IDMxMiwKICAgIH0sCg=="
    },
    {
      "digest": "sha256:3c3a4604a545cdc127456d94e421cd355bca5b528f4a9c1905b15da2eb4a4c6b",
      "mediaType": "application/vnd.oci.artifact.manifest.v1+json",
      "size": 237,
      "artifactType": "spdx.sbom.v3",
      "data": "ICAgIHsKICAgICAgZGlnZXN0OiBzaGEyNTY6M2MzYTQ2MDRhNTQ1Y2RjMTI3NDU2ZDk0ZTQyMWNkMzU1YmNhNWI1MjhmNGE5YzE5MDViMTVkYTJlYjRhNGM2YiwKICAgICAgbWVkaWFUeXBlOiBhcHBsaWNhdGlvbi92bmQub2NpLmFydGlmYWN0Lm1hbmlmZXN0LnYxK2pzb24sCiAgICAgIGFydGlmYWN0VHlwZTogZXhhbXBsZS5zYm9tLnYwLAogICAgICBzaXplOiAyMzcsCiAgICAgIGRhdGE6IGFHVnNiRzg9CiAgICB9Cg=="
    }
  ],
  "@nextLink": "{opaqueUrl}"
}

Copy link
Contributor

@SteveLasker SteveLasker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two general approaches here:

  1. Return a list of artifact-descriptors, which add artifactType to the image-descriptor to differentiate each artifact from a common manifestType. (SBOM formats, Scan Results, Signature Formats, TBD)
  2. Return a collection of manifests, (binary encoded)

The implementation discussion has surfaced a number of challenges.

  1. Customer Managed Key Encryption (CMK) has complicated some of the designs for what can be stored as clear text for indexing, used by the referres api.
  2. When/where would the annotations be lifted from the manifest if returned in an artifact-descriptor response?
  3. How many round trips should a client make? Where's the balance?

While I was thinking the modified artifact-descriptor would be the best balance, I'm now more inclined to return encoded manifests.

It does mean the response of the referrers API will be larger, but it reduces compute on the registry for when to lift the annotations and reduces clients from having to query the manifest for each reference type the client requires.

It solves the annotation encryption issue as these aren't indexed and lifted into an artifact-descriptor.

descriptor.md Outdated
@@ -40,8 +40,11 @@ The following fields contain the primary properties that constitute an Artifact

- **`annotations`** *string-string map*

This OPTIONAL property contains arbitrary metadata for this descriptor.
This OPTIONAL property MUST use the [annotation rules](annotations.md#rules).
This OPTIONAL property contains the annotations from the annotations field in the artifact manifest for this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good clarification for which annotations are lifetd.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating, based on recent conversations.
With the proposal to OPTIONALLY return the encoded manifest as a data property, and the complexity of server-side processing to lift annotations, I think we're now saying there's no lifting, rather the annotations are only available within the manifest. This simplifies the response.

manifest-referrers-api.md Outdated Show resolved Hide resolved
need to determine which artifact to retrieve. For example, Notary v2 MAY use an annotation:
`"org.cncf.notary.v2.signature.subject": "wabbit-networks.io"`, which the client could use to determine which signature
to pull from the registry. Using annotations, clients can reduce round trips and the data returned to determine which
artifacts the specific client may require reducing network traffic and API calls. Future support of the `data` field in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we defer the data element out of this conversation? Combining too many concepts only complicates the scope of what we need to land now.
Request removing:

Future support of the data field in the list of layers/blobs will allow for a signature or other "small" payload to be retrieved with only a single additional call to the registry (a GET for the full manifest).

Copy link
Contributor

@sajayantony sajayantony Aug 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the above usage of data is fair. As long as we define that the data element in this context is used to return encoded manifests and can be queried from the /manifests API.
Also ensure that this is OPTIONAL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: the artifacts-spec, and the referrers/ api only return cncf.oras.artifact-manifests as they're the only type of manifests that support the subjectManifest. The discussion to use data to be an OPTIONAL property, that returns the encoded manifest is a different conversation from the manifest including the signature in a duplicative descriptor.data element of a signature perissted as a blob.
So, I still think we should remove the text regarding future support of the data filed as it's not relevant to the response discussion.

descriptor.md Outdated
@@ -40,8 +40,11 @@ The following fields contain the primary properties that constitute an Artifact

- **`annotations`** *string-string map*

This OPTIONAL property contains arbitrary metadata for this descriptor.
This OPTIONAL property MUST use the [annotation rules](annotations.md#rules).
This OPTIONAL property contains the annotations from the annotations field in the artifact manifest for this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating, based on recent conversations.
With the proposal to OPTIONALLY return the encoded manifest as a data property, and the complexity of server-side processing to lift annotations, I think we're now saying there's no lifting, rather the annotations are only available within the manifest. This simplifies the response.

@@ -1,14 +1,14 @@
# Manifest Referrers API

[ORAS Artifact-manifest](./artifact-manifest.md) provides the ability to reference artifacts to existing artifacts. Reference artifacts include Notary v2 signatures, SBoMs and many other types. Artifacts that reference other artifacts SHOULD NOT be tagged, as they are considered enhancements to the artifacts they reference. To discover referenced artifacts a manifest referrers API is provided. An artifact client, such as a Notary v2 client would parse the returned manifests, determining which manifest type they will pull and process.
[ORAS Artifact-manifest](./artifact-manifest.md) provides the ability to reference artifacts to existing artifacts. Reference artifacts include Notary v2 signatures, SBoMs and many other types. Artifacts that reference other artifacts SHOULD NOT be tagged, as they are considered enhancements to the artifacts they reference. To discover referenced artifacts a manifest referrers API is provided. An artifact client, such as a Notary v2 client would parse the returned manifest descriptors, determining which manifest type they will pull and process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the efforts to define the data field in a manifest in opencontainers/image-spec#826 doesn't have to block us from using it in a different context, specifically in the list of references for a manifest.

This context of using data as a response type makes more sense, as it's not a required duplication of content persisted in the registry, rather a payload optimization.

If we made the referrers API a list of descriptors that included the data field (and defined it to mean the base64 encoded manifest), then we would get effectively what you're proposing, but be able to re-use an OCI object instead of having to invent a new one. It would look like this:

I generally like this. The one concern is how do we account for artifactType filtering?
If the artifactType is embedded in the OPTIONAL data element, and the data element is empty as it exceeded the max capacity, how does a client know to filter cncf.notary.v2 signatures, vs spdx.sbom or other artifactTypes?

I'm thinking we should include the artifactType in the descriptor so it's always available with/without the data element being populated.

{
  "references": [
    {
      "digest": "sha256:3c3a4604a545cdc127456d94e421cd355bca5b528f4a9c1905b15da2eb4a4c6b",
      "mediaType": "application/vnd.oci.artifact.manifest.v1+json",
      "size": 312,
      "artifactType": "cncf.notary.v2",
      "data": "ICAgIHsKICAgICAgZGlnZXN0OiBzaGEyNTY6M2MzYTQ2MDRhNTQ1Y2RjMTI3NDU2ZDk0ZTQyMWNkMzU1YmNhNWI1MjhmNGE5YzE5MDViMTVkYTJlYjRhNGM2YiwKICAgICAgbWVkaWFUeXBlOiBhcHBsaWNhdGlvbi92bmQub2NpLmFydGlmYWN0Lm1hbmlmZXN0LnYxK2pzb24sCiAgICAgIGFydGlmYWN0VHlwZTogY25jZi5ub3RhcnkudjIsCiAgICAgIHNpemU6IDMxMiwKICAgIH0sCg=="
    },
    {
      "digest": "sha256:3c3a4604a545cdc127456d94e421cd355bca5b528f4a9c1905b15da2eb4a4c6b",
      "mediaType": "application/vnd.oci.artifact.manifest.v1+json",
      "size": 237,
      "artifactType": "spdx.sbom.v3",
      "data": "ICAgIHsKICAgICAgZGlnZXN0OiBzaGEyNTY6M2MzYTQ2MDRhNTQ1Y2RjMTI3NDU2ZDk0ZTQyMWNkMzU1YmNhNWI1MjhmNGE5YzE5MDViMTVkYTJlYjRhNGM2YiwKICAgICAgbWVkaWFUeXBlOiBhcHBsaWNhdGlvbi92bmQub2NpLmFydGlmYWN0Lm1hbmlmZXN0LnYxK2pzb24sCiAgICAgIGFydGlmYWN0VHlwZTogZXhhbXBsZS5zYm9tLnYwLAogICAgICBzaXplOiAyMzcsCiAgICAgIGRhdGE6IGFHVnNiRzg9CiAgICB9Cg=="
    }
  ],
  "@nextLink": "{opaqueUrl}"
}


The `referrers` API returns all artifacts that have a `subjectManifest` to given manifest digest. Referenced artifact requests are scoped to a repository, ensuring access rights for the repository can be used as authorization for the referenced artifacts.

Artifact references are defined in the [oras.artifact.manifest spec][oras.artifact.manifest-spec] through the [`subjectManifest`][oras.artifact.manifest-spec-manifests] property.

## Request All Artifact References

The referrers api is sits alongside the [distribution-spec][oci-distribution-spec] paths avoiding any conflict with existing or new distribution apis. Pathing within the referrers api provides consistent repo/namespace paths, enabling registry operators to implement consistent auth access, using existing tokens for content.
The referrers api is sits alongside the [distribution-spec][oci-distribution-spec] paths avoiding any conflict with existing or new distribution apis. Pathing within the referrers api provides consistent repo/namespace paths, enabling registry operators to implement consistent auth access, using existing tokens for content.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The referrers api is sits alongside the

api is sits

need to determine which artifact to retrieve. For example, Notary v2 MAY use an annotation:
`"org.cncf.notary.v2.signature.subject": "wabbit-networks.io"`, which the client could use to determine which signature
to pull from the registry. Using annotations, clients can reduce round trips and the data returned to determine which
artifacts the specific client may require reducing network traffic and API calls. Future support of the `data` field in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: the artifacts-spec, and the referrers/ api only return cncf.oras.artifact-manifests as they're the only type of manifests that support the subjectManifest. The discussion to use data to be an OPTIONAL property, that returns the encoded manifest is a different conversation from the manifest including the signature in a duplicative descriptor.data element of a signature perissted as a blob.
So, I still think we should remove the text regarding future support of the data filed as it's not relevant to the response discussion.

@SteveLasker
Copy link
Contributor

SteveLasker commented Aug 10, 2021

To help frame the conversation, I've added a discussion: Artifact-spec referrers/ API Response

Signed-off-by: Michael Brown <[email protected]>
Copy link
Contributor

@SteveLasker SteveLasker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @michaelb990 for capturing the heart of the latest round of discussions. This pivots the response back to using descriptors, which does require a round trip for the manifest. But, it also removes the problematic un-encoded manifest response that couldn't be trusted, requiring a round trip for the manifests the client did need. While providing non-trusted manifests helped provide client-side filtering for which manifests needed to be returned, it didn't meet the balance.
Based on the various discussions, including the [baseline requirements]#7), I think we've landed on:

  • server-side filtering of artifactType is optional, so it needs to be returned for client-side filtering. Without the artifactType, the client would have to retrieve all manifests to parse the artifactType , allowing the notary client to limit pulling notary manifests, while an sbom client can limit pulling sbom manifest.
  • there are some encryption scenarios, which need further verification whether registry operators can index the artifactType so it may not be returned, requiring the client to pull all manifests, to identify which manifests apply to the specific client.
  • the data element could be an optional way to embed the encoded manifest, reducing the additional round trip for retrieving artifact specific manifest, However, depending on the size of the manifests, the server may exclude the data from the referrers/ API response. This means clients must be capable of pulling the manifest anyway. By deferring the encoded data element until later, we're not losing a user scenario, just deferring a possible optimization that not all clients would be able to take advantage of.

I noticed a few inconsistencies in some names (all nits and likely from my orirignal PR), like application/vnd.cncf.notary , cncf.notary vs. org.cncf.notary. Just to get this PR merged, I'd suggest we LGTM this, and I can do a quick scrub of the nits and PR the changes for consistency.

LGTM

@michaelb990
Copy link
Contributor Author

Updated this PR to remove the manifest encoding for now and just return a list of descriptors. I will watch opencontainers/image-spec#826 to see how the data in a descriptor discussions go, but I'd be generally supportive of adding an optional data field to the referrers response once that behavior is defined that could include the encoded artifact manifest and allow clients to save a roundtrip.

@SteveLasker
Copy link
Contributor

I'd be generally supportive of adding an optional data field to the referrers response once that behavior is defined that could include the encoded artifact manifest and allow clients to save a roundtrip.

As our discussion for support of data has been referenced in the data pr on the image-spec, it's worth clarifying the usage here.

The use of the data element in the referrers/ API response is to OPTIONALLY add encoded manifests. It does not imply the storage of duplicative data. It's an optional optimization on a response object.

One of my concerns with data proposal in image-spec PR #826 is it duplicates the data persisted in the layer in the manifest, unwinding the value of the deduping capabilities of registries. Including the most recent cross-mount deduping API in distribution. Allow for automatic content discovery for cross-mounting blobs #275

It's the possibility that an optional encoded manifest that may be returned in the referrers API may include duplicative layer content that makes it harder for the referrers API to support returning encoded manifests.

@SteveLasker SteveLasker merged commit 54cb751 into oras-project:main Aug 13, 2021
@michaelb990 michaelb990 deleted the mb/references-api-changes branch August 13, 2021 18:25
@jonjohnsonjr
Copy link

As our discussion for support of data has been referenced in the data pr on the image-spec, it's worth clarifying the usage here.

The use of the data element in the referrers/ API response is to OPTIONALLY add encoded manifests. It does not imply the storage of duplicative data. It's an optional optimization on a response object.

One of my concerns with data proposal in image-spec PR #826 is it duplicates the data persisted in the layer in the manifest, unwinding the value of the deduping capabilities of registries. Including the most recent cross-mount deduping API in distribution. Allow for automatic content discovery for cross-mounting blobs #275

Hey Steve, I just want to clarify that the data element proposed in opencontainers/image-spec#826 isn't specific to layers or images. It adds a field to descriptors, which enables the exact the use case described here. The flexibility to use that field in multiple contexts is a consequence of how OCI is structured around descriptors, and I would argue that is a good thing.

As I've explained before, registries have the flexibility to reject manifests that they deem too large, if storing them would be burdensome, and we've discussed trying to find a reasonable size that client can rely on. I would encourage you to join in that conversation: opencontainers/distribution-spec#293

The concerns around unwinding the value of deduplication are pretty insubstantial. Data denormalization is a common optimization in distributed systems with well understood trade-offs. While I agree with you that more deduplication is something to be desired, your concerns regarding data embedded within a manifest are a drop in the bucket compared to what we're currently leaving on the table by continuing to distribute images as compressed tarballs.

Given your passion for maximal deduplication, I would encourage you pick up the torch on revisiting how layers are encoded. Some more context:

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Referrers API should return a list of descriptors, not full manifests
5 participants