Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image signature format #59

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions docs/signature.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Image Signature Specification

**Version 0.1**

## Introduction

This document defines a detached container image signature object and signing methods.

## Signature Format

```js
{
"critical": {/* required fields */
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why anyone would explicitly encode "critical" or "optional" as fields. The spec ought define what the critical fields are, and what permissible optional fields are. Explicitly encoding this information seems very dubious and I think a lot of programmers would look at this decision and be very confused, and put off.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any other specifications out there that encode schema information in this manner?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a forward compatibility mechanism. Sure, a current implementation of a current spec does not need this explicit designation, but when the spec changes in a year or two (which is pretty likely), and new signatures start appearing, old implementations still need to be able to process them somehow. Separating the new fields between critical (if the old implementation doesn’t understand the data, it must reject the signature) and optional (if the old implementation doesn’t understand the data, it can proceed anyway) is a way for old implementations unaware of the updated spec to handle the signatures reasonably correctly.

And yes, this is a very common mechanism, e.g. https://tools.ietf.org/html/rfc4880#section-5.2.3.1

Bit 7 of the subpacket type is the "critical" bit. If set, it denotes that the subpacket is one that is critical for the evaluator of the signature to recognize. If a subpacket is encountered that is marked critical but is unknown to the evaluating software, the evaluator SHOULD consider the signature to be in error.

or https://tools.ietf.org/html/rfc5280#section-4.2

Each extension in a certificate is designated as either critical or non-critical. A certificate-using system MUST reject the certificate if it encounters a critical extension it does not recognize or a critical extension that contains information that it cannot process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Wed, Nov 02, 2016 at 09:22:02AM -0700, Miloslav Trmač wrote:

This is a forward compatibility mechanism. Sure, a current implementation of a current spec does not need this explicit designation, but when the spec changes in a year or two (which is pretty likely), and new signatures start appearing, old implementations still need to be able to process them somehow.

An alternative is to bump the schema's major version when you add a new schema with a new, critical parameter, bump the schema's major version number. Consumers who only understand the v1 type can error out if you send them a v2 name-assertion. That lets you drop the critical/optional distinction from properties within the schema, pushing it out to the schema as a whole. You get a simpler schema, but the cost is that publishers who only need v1 features but publish a v2 name-assertion anyway cut out v1-only implementations, while with the internal critical/optional distinction v1-only implementations could continue to use those signatures. I don't expect container implementations to have such long lifespans that that distinction is important, but it seems like more of a policy decision than a technical decision (so maintainers can pick a direction after some navel contemplation ;).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loose coupling utility is interesting. I'm not sure that references to binary protocols is a fair comparison. Has any JSON ever, anywhere done anything like this segmentation of fields under category groupings?

Rather than group fields under these major categories- a practice that I continue to think will cause worlds of head scratching from developers who've used JSON at any point in time- I'd much rather see either, as wking suggested, some kind of explicit signalling, or more generally than either solution, add a "critical": ["example-field-1", "example-field-2"] field.

I'd really like to see some other means of attaining the desired loose coupling. The developer acceptance factor of this practice is not going to be high; those who stumble across this have heads turned.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Like, I realize that JSON on the web and REST / JSON-RPC and whatever is all excited about duck typing and schema-less databases and “be liberal in what you accept”, but those are very much anti-features for signatures. We never want to end up with two clients with the same configuration set up by administrators interpreting the same signature blob differently.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… but those are very much anti-features for signatures…

Agreed. I was pointing out that you can also strictly-type signatures at the signature-schema level instead of at the property level. And that's even less liberal in what you accept (“Is that a v1.1 signature? Sorry, I only accept v1.0 signatures.”), although the difference there is only for optional fields. And while I'm generally not a fan of peek-inside typing (lots of back and forth in opencontainers/image-spec#411), I think we probably do want to have something like:

All signature blobs MUST start with the ASCII bytes representing their media type and a trailing newline.

So you can't trick verifiers by getting them to interpret the signature blob as a media type other than the one which the signer intended.

Copy link
Contributor

@wking wking Nov 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All signature blobs MUST start with the ASCII bytes representing their media type and a trailing newline.

Sorry, that^ should have been:

All signed-assertion blobs MUST start with the ASCII bytes representing their media type and a trailing newline.

Presumably the signature blobs (e.g. a detached OpenPGP signature) have already taken steps to avoid the “tricked you into interpreting this as the wrong protocol/schema” vulnerability.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is super bizaare putting this in the schema

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is super bizaare putting this in the schema

Do you disagree with the forward-compatibility goals (see #59 (comment) ), or with the implementation?

"identity": {/* identity reference */},
"image": {/* signed object reference */ },
"type": "..."
},
"optional": {/* optional metadata fields */}
}
}
```

### Fields

There are two top-level fields, **critical** (required) and **optional** (optional).

#### `critical`

**identity** (string):

```js
{
"docker-reference": imageName
}
```

`imageName` per [V2 API](https://docs.docker.com/registry/spec/api/#/overview) Required.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why encode the name?

Copy link
Collaborator

@mtrmac mtrmac Nov 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use case is “I am pulling a RHEL 8.10.5, and I want the signature to protect me against a malicious registry which instead gives me (a correctly signed) RHEL 5.1.0 with known vulnerabilities”.

When the name is a part of a signature, there can be a single private/public key pair for all author’s images [or, sure, more key pairs as the vendor desires for compartamentalization or organization structure reasons], and all users of the software coming from that author can just set up that single public key as trusted for a part of the reference namespace, and then they are automatically protected, whichever image from that namespace they pull.

This would be much more difficult if the signatures were nameless and carried no other semantics; the author could perhaps plausibly have a key pair per major release, but that would still allow a substitution between RHEL 8.10.5 and RHEL 8.10.4; and to protect against that by requiring all users to install a new public key for every single minor release would be completely impractical.

Yes, this does tie the signature to a particular image naming namespace, and the signature format will need to be extended to support different image transport methods with different image naming namespaces (like the AppC namespace/image discovery mechanism).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to including the name being asserted.

On Fri, Nov 18, 2016 at 11:58:44AM -0800, Miloslav Trmač wrote:

Yes, this does tie the signature to a particular image naming namespace…

This is “if you're trust policy is based on namespaces, signed name assertion trust will be tied to a particular image naming namespace”. That's true, but you could also have clients who decide to allocate trust based on other criteria (maybe they want to delgate trust for the exact name “RHEL-8.10.5” to somebody, or whatever). In those cases signed name assertion trust is not tied to a particular namespace. So there's no limitation around namespacing built into this name-assertion schema.

… and the signature format will need to be extended to support different image transport methods…

You can avoid this by not including transport/location information in the asserted name. For example, if you sign the name-assertion with “RHEL-8.10.5” as the name, then a user who requested RHEL-8.10.5 and trusts the signer for that namespace (maybe RHEL-*?) can trust the name assertion regardless of where they found the image.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid this by not including transport/location information in the asserted name.

That doesn’t really work; we need a global namespace for looking up the trusted key (cryptographic signature verification should happen before anything even touches the JSON data!), and we need to be able to compare the name inside the signature with something the user has provided which unambiguously specifies intent. The only really practical way to do this is to tie both the key lookup and the name comparison to the name/reference/“pull specification” used by the user when saying “get/start this container”, because that’s the only identification which we will always have without requiring the user to change their workflow.

It is acceptable to ask the user to do an one-time setup to trust key X for example.com; it is completely unreasonable to ask the user to say $tool pull --expected-name='Product By Example.COM 5.1.3' example.com/product5/standalone:1.3.

Copy link
Contributor

@wking wking Nov 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Fri, Nov 18, 2016 at 02:05:53PM -0800, Miloslav Trmač wrote:

You can avoid this by not including transport/location information in the asserted name.

That doesn’t really work; we need a global namespace for looking up the trusted key (cryptographic signature verification should happen before anything even touches the JSON data!)…

What does this have to do with JSON? You should only be looking inside the name-assertion after you've authenticated the signature on it as “signed by someone you trust to assert the name you expect to see inside”. Decoupling the transport from the asserted name shouldn't affect that. And if you're comfortable with signature-discovery as layed out in #120, there's no reason you need to put the CAS- or ref-engine location in the name to use that scheme. Just use internal logic like:

signature_location(engine_location, requested_name) → URLs

… and we need to be able to compare the name inside the signature with something the user has provided which unambiguously specifies intent.

Agreed. I think “RHEL-8.10.5” unambiguously specifies intent.

It is acceptable to ask the user to do an one-time setup to trust key X for example.com; it is completely unreasonable to ask the user to say $tool pull --expected-name='Product By Example.COM 5.1.3' example.com/product5/standalone:1.3.

I'd name that product “product5/standalone:1.3”. Then the user runs:

$ tool pull https://example.com product5/standalone:1.3

which says “hit the refs and CAS engines at https://example.com and fetch me ‘product5/standalone:1.3’”. You could also specify default ref/CAS engines in a config file and run:

$ tool pull product5/standalone:1.3

which says “go to wherever you usually go for this sort of thing and fetch me ‘product5/standalone:1.3’”.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd name that product “product5/standalone:1.3”.

Then we are back to tying the names to a particular namespace (design), and we have achieved nothing.

And, ultimately, making this interoperable with existing Docker references is a hard design constraint for us; inventing an entirely new naming and location mechanism is firmly out of scope. We do have the forward compatibility mechanisms necessary to support other reference formats if/when/as that becomes necessary, but the signature design is absolutely not in the business of inventing new ones.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is acceptable to ask the user to do an one-time setup to trust key X for example.com…

I missed this distinction in my previous comment, but if your trust policy is “I trust anything that example.com does so long as the asserted name starts with ‘example.com/’”, you should be able to accomplish that same goal with names like product5/standalone:1.3 and an API like:

$ tool pull --namer example.com product5/standalone:1.3

which is “go to wherever you usually go for this sort of thing, fetch me product5/standalone:1.3 and ensure it's been named by example.com”. Where you get it from isn't important. If folks want some syntactic sugar around transport → namer lookup, you could map:

$ tool pull example.com/product5/standalone:1.3

to:

$ tool pull --namer example.com --engine https://example.com product5/standalone:1.3

Copy link
Contributor

@wking wking Nov 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Fri, Nov 18, 2016 at 02:26:30PM -0800, Miloslav Trmač wrote:

I'd name that product “product5/standalone:1.3”.

Then we are back to tying the names to a particular namespace (design), and we have achieved nothing.

How does “describe the image in the image name, but don't inlclude the transport or signer” tie the names to a particular namespace?

And, ultimately, making this interoperable with existing Docker references is a hard design constraint for us…

My understanding of Docker's naming policy is that each “repository” (e.g. docker.com/library/ubuntu) has it's own root keys owned by the publisher. So you'd run:

$ tool pull docker.com/library/ubuntu:16:10

which would map to:

$ tool pull --namer docker.com/library/ubuntu --engine https://docker.com/library/ubuntu ubuntu:16:10


**image** (string):

```js
{
"docker-manifest-digest": manifestDigest
}
```

`manifestDigest` in the form of `<algorithm>:<hashValue>`

**type** (string): Only supported value is "atomic container signature"

#### `optional`

**creator** (string): Creator ID. This refers to the tooling used to generate the signature.

**timestamp** (int64): timestamp epoch

### Example

```js
{
"critical": {
"identity": {
"docker-reference": "busybox"
},
"image": {
"docker-manifest-digest": "sha256:a59906e33509d14c036c8678d687bd4eec81ed7c4b8ce907b888c607f6a1e0e6"
},
"type": "atomic container signature"
},
"optional": {
"creator": "atomic 0.1.0-dev",
"timestamp": 1471035347
}
}
```

### Encryption and Decryption

The signature data is written to a file that is encrypted and signed with a private key. The file may be decrypted (verified) using the corresponding public key.

**Example GPG Sign command**

Given signature file busybox.sig formatted per above:

```
$ gpg2 -r KEYID --encrypt --sign busybox.sig
```

**Example GPG Verify command**

```
$ gpg2 --decrypt busybox.sig.gpg
```