Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signature Format #4

Open
lrvick opened this issue Oct 6, 2018 · 17 comments
Open

Signature Format #4

lrvick opened this issue Oct 6, 2018 · 17 comments
Labels
needs-feedback Needs more people to give their opinion

Comments

@lrvick
Copy link

lrvick commented Oct 6, 2018

I experimented with the following format today as a single json line incorperating ideas from @oxij with mine. Note this example also assumes potential use cases like outlined in #2

Will update the following example as we reach agreement on format (or any "Git romans" style alternative format we want to compare against)

In case it is not obvious, the "sig" value is valid against a one line representation of anything in the "body" object.

This example is also intended to represent all options we might want to support so we can decide on a sane future-proof format. Any options here are not implied to be included in a v1 spec.

{
  "body":{
    "commit":"d27e15a09812241167a61c3222cb6e97c5b0d6b0",
    "tree":"58215f173d6ef6892a07c9073d55145f387d1968",
    "patch":"068a37e54586de8339f13ec980d31f6c30b6f6e7",
    "date": "2018-10-05T03:01:46Z",
    "review": {
      "context-understanding": "author",
      "diff-understanding": "high",
      "thoroughness":"medium",
      "result":"+"
    },
    "artifacts": {
      "out.tar.gz":"2ad6f470b5398251018a4c25f5eb35686481681f"
    }
  },
"sig":"iQJFBAABCAAvFiEEZ1U/vaRrtxq9LgsLjkeh7DWhVR0FAlu2h3IRHGxhbmNlQGxydmljay5uZXQACgkQjkeh7DWhVR0K9Q//T+UleZFWlbiIFoUBp1hX0xzg/eEaTFnnlCdWWaa5f1E+TI80Pg/wXhpEUd98ghZThXwaWJwnPp34BpZ55GU5Qr1cXjY00Yt7I0p3a5TUsdk5FP8tiUWNQ9kA6npUOIVixa8HFkhz0SvHvXNAvWsasqw6nb+SNfA+aQYIP/wAY71ZpMLkJsMQssaFWsjL/hsSmOpi7VyLEFEuUmEM1CA6VPiMvp4rvP75Bt4DnEouyFPxk4WbVIqX4DIeG8+5W8jQCLlxV6HDH4g+POdZbD7/Yg6XjDKEfD2vM/OYrBczPZX0Tu1WhVyYOtQ/WsAr9wzZjgl50/9UkP6E4pt9ft42jdNhGcO1sEwzlC5W1efC2izjL4medEiLe00zIM0L42X10HebH22j4gKyb4EErSHo0H5eLisf1xAwAHiOE2wJou1SR5dVmKydXicVN3wuLqhzc52awmoOaEts1Q3zFyB8MfDIi39CACwAvOTrKw54raoGgzxDEFEP3sKWF1FuZQJreG1Ufg57Oxv5f1lOqvaoMN7ZNjcQkPUAqj7G7e/fwyXGCG2mvnz++D5/2JAMeeO4k8iwxGXEsF0qD3QveEZPFoIpPZX+uOGqM9iIPDJd9bGu0xI5yWeqLpbdXdm4ClsiRWOmavUvjm8RK99o7I9EvctCzrIrGnwlySZ0+Po856I="
}
@lrvick lrvick mentioned this issue Oct 6, 2018
@Ekleog
Copy link
Contributor

Ekleog commented Oct 6, 2018

@oxij Just noticed that the current spec doesn't include the question of where the signature is to be put is not included in the current spec. Do you have an opinion about it? Git does gpgsig [gpg signature with newlines included, one space at the beginning of each new line] in the metadata fields, but I'm not sure it'd be really easy to parse? Sounds like the least bad option anyway, though.

@lrvick I think the argument in favor of keeping a format similar to what git does is, it'd make integration into git upstream much easier. If it's using JSON, I'd be willing to bet Junio Hamano would laugh at us. Now… I can only bet :)

@lrvick
Copy link
Author

lrvick commented Oct 6, 2018

@lrvick I think the argument in favor of keeping a format similar to what git does is, it'd make integration into git upstream much easier. If it's using JSON, I'd be willing to bet Junio Hamano would laugh at us. Now… I can only bet :)

If something to address this problem was included in git-core upstream then the advantages of JSON for easy parsing by third party verification tools go out the window, as well as the need for versioning.... because git is the authoritative tool. There is also no reason it would use the git-notes interface at all as there could be a first class object for signatures. If such a thing were to happen it would deprecate all our current work as prior art that informed design. In the mean time I think we have to use the most practical tools we can reach for going with the assumption that we need easy third party code support for now.

By the way I took a look at https://github.com/google/git-appraise and they look like they have landed on very similar directions (also pretty amazing project for decentralized code review where the central VCS interface is not trusted.).

@Ekleog
Copy link
Contributor

Ekleog commented Oct 7, 2018

Well… if this problem is addressed in git upstream, then hopefully they'd reuse the same data model as us, so that previous signatures would still be valid with the git upstream verifier without having to re-sign everything.

Which makes me think, ISTR at some point you said you had been with a member of the git development team who said you should pursue integration in git upstream for reviews… can you invite them to this discussion? Hopefully they'd have hints on how to do things so they will be more easily upstreamed :)

@oxij
Copy link
Member

oxij commented Oct 8, 2018

Concerning versioning, I think version should be per-branch, not per-review.
Concerning JSON and third-party verification tools, I think we can always add JSON output on top.

I think git-wotr should be the minimal viable spec. E.g. reading discussions in the issues here I'm inclined to just drop every metadata field except "result" and "result-otherwise" from the top-level spec since only those, I think, should be used to make decisions in the general case. We can then allow chaining of formats. E.g.

commit <commit-id> diff
with <commit-id>
result +
result-otherwise !
content-type review

thoroughness high
diff-understanding high
content-type git-signatures/artifacts.json

<JSON you want>
commit <commit-id> diff
with <commit-id>
result +
result-otherwise !
content-type comment # the default

<comment>

?

@oxij
Copy link
Member

oxij commented Oct 8, 2018 via email

@lrvick
Copy link
Author

lrvick commented Oct 8, 2018

@oxij So multiline gets much harder to parse, but this did just give me an idea of maybe a happy medium...

What if the signature was just our most simple "git romans" format in one line. Tools could use this alone as an "I approve this commit" and need not worry about any metadata (the most typical use case I imagine).

We could however add at the end a key of a hash of all the metadata, which can be found at refs/notes/signatures/

We could end up with a format with only our highest level keys to anchor things.

Namely:

<commit-hash> <patch-id> <metadata-hash>

...where metadata hash and patch-id are optional

From there we can use content-type delimited blobs exactly as you propose in the "attached" note and any tool can include anything they feel valuable. Have cake and eat it too?

@oxij
Copy link
Member

oxij commented Oct 8, 2018

How are multiple lines are harder to parse than a single line? It's just a sequence of chars.

What is hard to parse is escaping (like JSON has), with escaping you need an actual parser, with git-object-like format you only need to match sequences of chars, which can be done with simple regular expressions (note that the grammar is design.org is regular, this is not an accident), if not simpler.

How would you do with in one-line format?

git-object-like format still allows extensions when combined with #10 (comment)

@lrvick
Copy link
Author

lrvick commented Oct 8, 2018

Correction: they are harder to merge. The setup I proposed would never have a merge conflict.

So far I have dodged this problem by just using one-line json blobs, but what I proposed above odges this by delegating all metadata to optional attachments in separate "files" in the git notes tree so there is never a clash between one signer and another.

@oxij
Copy link
Member

oxij commented Oct 8, 2018 via email

@Ekleog
Copy link
Contributor

Ekleog commented Oct 9, 2018

@oxij

Concerning versioning, I think version should be per-branch, not per-review.

Issue with this is, an attacker can cherry-pick a review from another version branch and re-use it.

Strike that, it's handled by having all commits on the refs/reviews branch being signed. Also, anyway the ideas on #10 leave the path open for a future version attribute.

I think git-wotr should be the minimal viable spec. E.g. reading discussions in the issues here I'm inclined to just drop every metadata field except "result" and "result-otherwise" from the top-level spec since only those, I think, should be used to make decisions in the general case.

Well, I guess we totally agree about the fact that this should be the minimal viable spec, but I'm not sure whether other fields should be used or not in determining commit trustworthiness. I guess it's a matter for #9, though :)

We can then allow chaining of formats

This sounds like a good idea iff most tools can be done without looking at anything else than a single layer. IOW, if concerns are actually decoupled.

Also, I fear a bit this direction: then, it's only a step away from multipart/mixed, and I feel like that's something we should try to avoid if we don't want the thing to become a behemoth of complexity. I'd be more comfortable saying people who really want to store additional information should do so in the commit message / x-* headers and have their tooling on the side.


@lrvick TBH I'm not sure about the issue of one-line vs. multiple-lines. The advantage of one-line JSON is that it can be parsed and verified as is with a dependency on a JSON parser.

But having multiple lines isn't that big a deal: we can just base64 it all and put it all on one line.

With that in mind, I don't really see a reason not to have the content in the same object as the signature: it'd just make things less nice to debug.

@Ekleog
Copy link
Contributor

Ekleog commented Oct 9, 2018

Whoops, comment went out too fast. First remark now reads:

Issue with this is, an attacker can cherry-pick a review from another version branch and re-use it.

Strike that, it's handled by having all commits on the refs/reviews branch being signed. Also, anyway the ideas on #10 leave the path open for a future version attribute.

@oxij
Copy link
Member

oxij commented Oct 12, 2018 via email

@Ekleog
Copy link
Contributor

Ekleog commented Oct 13, 2018

If we really want we can also skip the base64 step as OpenPGP packets have clear boundaries and those can be easily parsed with sane languages, but I'm all for keeping it as it makes a simple bash implementations viable too.

Same here, especially as it leaves the door open to a sorting strategy like cat_sort_uniq, should we eventually find it useful.

My main problem with the whole issue is that by adding each new field on the top level we add assumptions and restrictions, and I want "tools, not restrictions".

Well, that's true… but at the same time we do need to standardize on something if we want tools to be interoperable. So at least the result field is strictly needed.

E.g. I imagine some projects might want both diff- and context-understanding to be a single attribute.

For instance, this problem can be solved by just having the tool present a single option and writing equal values to diff- and context-understanding under the hood. Now, yes, existence of these attributes can be discussed.

E.g. I myself now think that maybe we should also have an inverse to ! to mark security fixes and ? to explicitly say "should be reviewed by someone else" (aka review request in git-appraise). I.e. something like

For security fixes, this can be found by listing the with clauses of result + result-otherwise ! commits. Also… well, I don't see why a security-fix review should be handled differently from a positive review, so IMO this should at most be a different key.

For ?, same as above: it doesn't bring any meaningful information to #9, so should not go into result IMO.

However, the 0 already present is actually required, eg. to handle my reply to 7c6f434c in NixOS/rfcs#34 (comment): handling would be with result 0 result-with + and with all the commits of the PR, if one only checked the outcome of the PR. (note: this paragraph may depend on the outcome of #14, if a commit-diff signs the diff to the first parent only then it isn't required)

and "FYI" reviews would then have no result* fields at all and would be completely ignored by verify operation but still shown by show. I.e. you could then use them as usual comments (storing them in a separate branch with relaxed constraints might be good then too).

That… would sound just like git-notes, actually. Not sure we actually have a need for this, IMO we're going too far in the “encompass all the possible use cases” here, and we'll end up rewriting git if we continue this direction :)

My vision of our need is a system for signing code review of diffs in a context in a way for which verification can be automated. In this context, security fix handling should be considered only insofar as it helps automated verification of code review. Additional metadata can be considered as additional metadata, that need not be in the standard -- and particularly not in the review field IMO.


tl;dr: I think we need at least !-0+ for review{,-otherwise}, and with for commits. I think all the rest could go into implementation-defined extensions indeed (which could be content-type, x-* fields, etc.)

@oxij
Copy link
Member

oxij commented Oct 13, 2018 via email

@Ekleog
Copy link
Contributor

Ekleog commented Oct 13, 2018

But maybe those extensions should then be processed via hooks that would classify reviews into categories which you would then process like you propose in #9 (comment). E.g. you would then assign "+" without high-enough thoroughness to a lowest category or something. Or you could discount "+"es by authors that seem to miss many issues in their reviews. How does that sound?

That makes sense :) overall, I think these can be tool-specific features, and I more and more think that if a way to define policy should be given by the RFC for projects to give their policies, tools should be able to add in tool-specific features that take in addition some additional data in the reviews.

  • The idea behind "?" still seems useful to me: a kind of review type for getting attention. I agree that maybe it shouldn't be a result, but if not a result, then what? How about something like

Your examples LGTM… except that I do fear content-type, among other reasons because then we'd need the RFC to actually standardize its possible contents (or leave that to MIME, which would be really meh).

@Ekleog Ekleog added the needs-feedback Needs more people to give their opinion label Oct 25, 2018
@dpc
Copy link

dpc commented Dec 11, 2018

Hi Lance, it's Dawid! @lrvick . I'm working on something different, but similar: cargo-crev - which is a tool for reviewing Rust/crates.io packages, and a part of broader ecosystem of code-trust ideas (crev). @Ekleog pinged me to come over.

So I've been independently creating own format for crev, and surprisingly came to something really similar. You can take a look: https://github.com/dpc/crev-proofs/blob/master/_xQgkbDAQx3nSV5SMfdEeQBSYiPwSI32wnMxnjExk24%3D/reviews/2018-12-packages.proof.crev

Regarding the cosmetic differences:

  • I'm using Yaml, where I've stripped leading ---\n. It's human readable, and simple enough that can be parsed even with a grep if need be, but most tools should be able to take a standard yaml library. Or just use Rust library that implements it.
  • I come from a point of view, that the more fields you have, the less chance people will bother filling them carefully (or at all). Because of that, I think one understanding is enough.
  • In crev result is a recommendation that can be one of strong, positive, neutral, negative and dangerous. I don't understand why would you want to use + and other cryptic values. Is being self-explanatory, and easy for the eye not a good thing?

I have very shallow understanding of scope and goals here, so please excuse if I get anything wrong, but some differences from my impression so far (the goals of projects might be different, so I'm just staying them, without much opinion):

  • "Proofs" (signed document in crev) are just text objects and are transport-independent. I envision they are going to be mostly circulated out-of-band, via syncing of personal git repositories, but that could change. Tying everything too much with git is IMO, a mistake. git is most popular revision system right now, but not everyone is using it right now, and in the future we might use something else entirely. (like pijul).

I have a big chunk of functionality ready for cargo-crev, so you might want to give it a try for inspiration, or just to give me feedback.

@oxij
Copy link
Member

oxij commented Dec 15, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-feedback Needs more people to give their opinion
Projects
None yet
Development

No branches or pull requests

4 participants