From aaf8045c3041ad9045e4b39d95b76eebe8a11937 Mon Sep 17 00:00:00 2001 From: Jon Johnson Date: Fri, 12 Mar 2021 15:13:58 -0800 Subject: [PATCH 1/7] Define the data field This should contain an embedded representation of the referenced content, which is useful for avoiding extra hops to access small pieces of content. Signed-off-by: Jon Johnson --- descriptor.md | 17 ++++++++--------- specs-go/v1/descriptor.go | 3 +++ 2 files changed, 11 insertions(+), 9 deletions(-) diff --git a/descriptor.md b/descriptor.md index 2ba672d07..45c1e5347 100644 --- a/descriptor.md +++ b/descriptor.md @@ -41,20 +41,18 @@ The following fields contain the primary properties that constitute a Descriptor - **`annotations`** *string-string map* - This OPTIONAL property contains arbitrary metadata for this descriptor. - This OPTIONAL property MUST use the [annotation rules](annotations.md#rules). + This OPTIONAL property contains arbitrary metadata for this descriptor. + This OPTIONAL property MUST use the [annotation rules](annotations.md#rules). -Descriptors pointing to [`application/vnd.oci.image.manifest.v1+json`](manifest.md) SHOULD include the extended field `platform`, see [Image Index Property Descriptions](image-index.md#image-index-property-descriptions) for details. - -### Reserved +- **`data`** *string* -The following field keys are reserved and MUST NOT be used by other specifications. + This OPTIONAL property contains an embedded representation of the referenced content. + Values MUST conform to the Base 64 encoding, as defined in [RFC 4648][rfc4648-s4]. -- **`data`** *string* +Descriptors pointing to [`application/vnd.oci.image.manifest.v1+json`](manifest.md) SHOULD include the extended field `platform`, see [Image Index Property Descriptions](image-index.md#image-index-property-descriptions) for details. - This key is RESERVED for future versions of the specification. +### Reserved -All other fields may be included in other OCI specifications. Extended _Descriptor_ field additions proposed in other OCI specifications SHOULD first be considered for addition into this specification. ## Digests @@ -179,6 +177,7 @@ In the following example, the descriptor indicates that the referenced manifest [rfc3986]: https://tools.ietf.org/html/rfc3986 [rfc4634-s4.1]: https://tools.ietf.org/html/rfc4634#section-4.1 [rfc4634-s4.2]: https://tools.ietf.org/html/rfc4634#section-4.2 +[rfc4648-s4]: https://tools.ietf.org/html/rfc4648#section-4 [rfc6838]: https://tools.ietf.org/html/rfc6838 [rfc6838-s4.2]: https://tools.ietf.org/html/rfc6838#section-4.2 [rfc7230-s2.7]: https://tools.ietf.org/html/rfc7230#section-2.7 diff --git a/specs-go/v1/descriptor.go b/specs-go/v1/descriptor.go index 6e442a085..c2ff43c4c 100644 --- a/specs-go/v1/descriptor.go +++ b/specs-go/v1/descriptor.go @@ -35,6 +35,9 @@ type Descriptor struct { // Annotations contains arbitrary metadata relating to the targeted content. Annotations map[string]string `json:"annotations,omitempty"` + // Data is an embedding of the targeted content. + Data []byte `json:"data,omitempty"` + // Platform describes the platform which the image in the manifest runs on. // // This should only be used when referring to a manifest. From ce281cecd7d902bd75eb19bf1f63ed0bed296532 Mon Sep 17 00:00:00 2001 From: Jon Johnson Date: Wed, 17 Mar 2021 13:58:15 -0700 Subject: [PATCH 2/7] Add Embedded Data section Signed-off-by: Jon Johnson --- descriptor.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/descriptor.md b/descriptor.md index 45c1e5347..aaac73c58 100644 --- a/descriptor.md +++ b/descriptor.md @@ -48,6 +48,8 @@ The following fields contain the primary properties that constitute a Descriptor This OPTIONAL property contains an embedded representation of the referenced content. Values MUST conform to the Base 64 encoding, as defined in [RFC 4648][rfc4648-s4]. + The decoded data MUST be identical to the referenced content and SHOULD be verified against the [`digest`](#digests) and `size` fields. + See [Embedded Content](#embedded-content) for when this is appropriate. Descriptors pointing to [`application/vnd.oci.image.manifest.v1+json`](manifest.md) SHOULD include the extended field `platform`, see [Image Index Property Descriptions](image-index.md#image-index-property-descriptions) for details. @@ -149,6 +151,17 @@ Implementations MAY implement SHA-512 digest verification for use in descriptors When the _algorithm identifier_ is `sha512`, the _encoded_ portion MUST match `/[a-f0-9]{128}/`. Note that `[A-F]` MUST NOT be used here. +## Embedded Content + +In many contexts, such as when downloading content over a network, resolving a descriptor to its content has a measurable fixed "roundtrip" latency cost. +For large blobs, the fixed cost is usually inconsequental, as the majority of time will be spent actually fetching the content. +For very small blobs, the fixed cost will be quite significant. + +Implementations MAY choose to embed small pieces of content directly within a descriptor to avoid roundtrips. + +Implementations SHOULD NOT populate the `data` field in situations where doing so would unexpectedly modify content identifiers. +For example, a registry SHOULD NOT arbitrarily populate `data` fields within uploaded manifests, as that would modify the content address of those manifests. + ## Examples The following example describes a [_Manifest_](manifest.md#image-manifest) with a content identifier of "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270" and a size of 7682 bytes: From 58c082da244713a3cf706b73266c29746a12b35c Mon Sep 17 00:00:00 2001 From: Jon Johnson Date: Thu, 18 Mar 2021 10:38:53 -0700 Subject: [PATCH 3/7] Add note about portability concerns Signed-off-by: Jon Johnson --- descriptor.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/descriptor.md b/descriptor.md index aaac73c58..0aa9d5e74 100644 --- a/descriptor.md +++ b/descriptor.md @@ -155,13 +155,16 @@ Note that `[A-F]` MUST NOT be used here. In many contexts, such as when downloading content over a network, resolving a descriptor to its content has a measurable fixed "roundtrip" latency cost. For large blobs, the fixed cost is usually inconsequental, as the majority of time will be spent actually fetching the content. -For very small blobs, the fixed cost will be quite significant. +For very small blobs, the fixed cost can be quite significant. Implementations MAY choose to embed small pieces of content directly within a descriptor to avoid roundtrips. Implementations SHOULD NOT populate the `data` field in situations where doing so would unexpectedly modify content identifiers. For example, a registry SHOULD NOT arbitrarily populate `data` fields within uploaded manifests, as that would modify the content address of those manifests. +Implementations SHOULD consider limitations of storage systems when deciding whether or not to embed data. +Many implementations will refuse to accept or parse manifests that violate the limitations of their storage systems, so descriptors concerned with portability SHOULD avoid embedding large amounts of data. + ## Examples The following example describes a [_Manifest_](manifest.md#image-manifest) with a content identifier of "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270" and a size of 7682 bytes: From 2596ec06321595adfeaa8c72a3d7586630ccd424 Mon Sep 17 00:00:00 2001 From: Jon Johnson Date: Thu, 18 Mar 2021 15:08:48 -0700 Subject: [PATCH 4/7] Expand godoc for Data Signed-off-by: Jon Johnson --- specs-go/v1/descriptor.go | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/specs-go/v1/descriptor.go b/specs-go/v1/descriptor.go index c2ff43c4c..94f19be62 100644 --- a/specs-go/v1/descriptor.go +++ b/specs-go/v1/descriptor.go @@ -35,7 +35,9 @@ type Descriptor struct { // Annotations contains arbitrary metadata relating to the targeted content. Annotations map[string]string `json:"annotations,omitempty"` - // Data is an embedding of the targeted content. + // Data is an embedding of the targeted content. This is encoded as a base64 + // string when marshalled to JSON (automatically, by encoding/json). If + // present, Data can be used directly to avoid fetching the targeted content. Data []byte `json:"data,omitempty"` // Platform describes the platform which the image in the manifest runs on. From fccc4356783ed83b46b6a626ea6fcf881a4a3929 Mon Sep 17 00:00:00 2001 From: Jon Johnson Date: Tue, 6 Apr 2021 10:53:53 -0700 Subject: [PATCH 5/7] Implementations MUST NOT populate data arbitrarily Also add a counterexample. Signed-off-by: Jon Johnson --- descriptor.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/descriptor.md b/descriptor.md index 0aa9d5e74..6054c222d 100644 --- a/descriptor.md +++ b/descriptor.md @@ -159,8 +159,9 @@ For very small blobs, the fixed cost can be quite significant. Implementations MAY choose to embed small pieces of content directly within a descriptor to avoid roundtrips. -Implementations SHOULD NOT populate the `data` field in situations where doing so would unexpectedly modify content identifiers. -For example, a registry SHOULD NOT arbitrarily populate `data` fields within uploaded manifests, as that would modify the content address of those manifests. +Implementations MUST NOT populate the `data` field in situations where doing so would modify existing content identifiers. +For example, a registry MUST NOT arbitrarily populate `data` fields within uploaded manifests, as that would modify the content identifier of those manifests. +In contrast, a client MAY populate the `data` field before uploading a manifest, because the manifest would not yet have a content identifier in the registry. Implementations SHOULD consider limitations of storage systems when deciding whether or not to embed data. Many implementations will refuse to accept or parse manifests that violate the limitations of their storage systems, so descriptors concerned with portability SHOULD avoid embedding large amounts of data. From 83479d49edb651e236e4f025f6be4a83f4c22324 Mon Sep 17 00:00:00 2001 From: Jon Johnson Date: Thu, 20 May 2021 08:29:59 -0700 Subject: [PATCH 6/7] Clean up portability considerations Also fix typo. Signed-off-by: Jon Johnson --- descriptor.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/descriptor.md b/descriptor.md index 6054c222d..bf3bce1d7 100644 --- a/descriptor.md +++ b/descriptor.md @@ -154,7 +154,7 @@ Note that `[A-F]` MUST NOT be used here. ## Embedded Content In many contexts, such as when downloading content over a network, resolving a descriptor to its content has a measurable fixed "roundtrip" latency cost. -For large blobs, the fixed cost is usually inconsequental, as the majority of time will be spent actually fetching the content. +For large blobs, the fixed cost is usually inconsequential, as the majority of time will be spent actually fetching the content. For very small blobs, the fixed cost can be quite significant. Implementations MAY choose to embed small pieces of content directly within a descriptor to avoid roundtrips. @@ -163,8 +163,7 @@ Implementations MUST NOT populate the `data` field in situations where doing so For example, a registry MUST NOT arbitrarily populate `data` fields within uploaded manifests, as that would modify the content identifier of those manifests. In contrast, a client MAY populate the `data` field before uploading a manifest, because the manifest would not yet have a content identifier in the registry. -Implementations SHOULD consider limitations of storage systems when deciding whether or not to embed data. -Many implementations will refuse to accept or parse manifests that violate the limitations of their storage systems, so descriptors concerned with portability SHOULD avoid embedding large amounts of data. +Implementations SHOULD consider portability when deciding whether to embed data, as some providers are known to refuse to accept or parse manifests that exceed a certain size. ## Examples From 0d98a6cdd25e00269fb99ed34e4eba69e46ea695 Mon Sep 17 00:00:00 2001 From: Jon Johnson Date: Tue, 10 Aug 2021 11:38:13 -0700 Subject: [PATCH 7/7] Scope data verification to content consumers While registries might want to verify the data field, we shouldn't rely on it, as many registries are unaware of this field. On the other hand, clients SHOULD verify the content before consuming it. Signed-off-by: Jon Johnson --- descriptor.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/descriptor.md b/descriptor.md index bf3bce1d7..8c1277a44 100644 --- a/descriptor.md +++ b/descriptor.md @@ -48,7 +48,7 @@ The following fields contain the primary properties that constitute a Descriptor This OPTIONAL property contains an embedded representation of the referenced content. Values MUST conform to the Base 64 encoding, as defined in [RFC 4648][rfc4648-s4]. - The decoded data MUST be identical to the referenced content and SHOULD be verified against the [`digest`](#digests) and `size` fields. + The decoded data MUST be identical to the referenced content and SHOULD be verified against the [`digest`](#digests) and `size` fields by content consumers. See [Embedded Content](#embedded-content) for when this is appropriate. Descriptors pointing to [`application/vnd.oci.image.manifest.v1+json`](manifest.md) SHOULD include the extended field `platform`, see [Image Index Property Descriptions](image-index.md#image-index-property-descriptions) for details.