-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Signature Format #4
Comments
@oxij Just noticed that the current spec doesn't include the question of where the signature is to be put is not included in the current spec. Do you have an opinion about it? Git does @lrvick I think the argument in favor of keeping a format similar to what git does is, it'd make integration into git upstream much easier. If it's using JSON, I'd be willing to bet Junio Hamano would laugh at us. Now… I can only bet :) |
If something to address this problem was included in git-core upstream then the advantages of JSON for easy parsing by third party verification tools go out the window, as well as the need for versioning.... because git is the authoritative tool. There is also no reason it would use the git-notes interface at all as there could be a first class object for signatures. If such a thing were to happen it would deprecate all our current work as prior art that informed design. In the mean time I think we have to use the most practical tools we can reach for going with the assumption that we need easy third party code support for now. By the way I took a look at https://github.com/google/git-appraise and they look like they have landed on very similar directions (also pretty amazing project for decentralized code review where the central VCS interface is not trusted.). |
Well… if this problem is addressed in git upstream, then hopefully they'd reuse the same data model as us, so that previous signatures would still be valid with the git upstream verifier without having to re-sign everything. Which makes me think, ISTR at some point you said you had been with a member of the git development team who said you should pursue integration in git upstream for reviews… can you invite them to this discussion? Hopefully they'd have hints on how to do things so they will be more easily upstreamed :) |
Concerning versioning, I think version should be per-branch, not per-review. I think
? |
In particular, with the above you can pack https://github.com/google/git-appraise reviews into this format pretty much verbatim (they have no equivalent of "result" field in the comment schema, so just set it to "0").
(Well, at least we are not arguing whenever we pack XML into there. Good.)
|
@oxij So multiline gets much harder to parse, but this did just give me an idea of maybe a happy medium... What if the signature was just our most simple "git romans" format in one line. Tools could use this alone as an "I approve this commit" and need not worry about any metadata (the most typical use case I imagine). We could however add at the end a key of a hash of all the metadata, which can be found at refs/notes/signatures/ We could end up with a format with only our highest level keys to anchor things. Namely:
...where metadata hash and patch-id are optional From there we can use content-type delimited blobs exactly as you propose in the "attached" note and any tool can include anything they feel valuable. Have cake and eat it too? |
How are multiple lines are harder to parse than a single line? It's just a sequence of chars. What is hard to parse is escaping (like JSON has), with escaping you need an actual parser, with git-object-like format you only need to match sequences of chars, which can be done with simple regular expressions (note that the grammar is How would you do git-object-like format still allows extensions when combined with #10 (comment) |
Correction: they are harder to merge. The setup I proposed would never have a merge conflict. So far I have dodged this problem by just using one-line json blobs, but what I proposed above odges this by delegating all metadata to optional attachments in separate "files" in the git notes tree so there is never a clash between one signer and another. |
"<commit-hash>,<patch-id>,<metadata-hash>"
Again, the problem with just this much is that you can't `verify` without looking at "with" and "result" metadata fields.
However, I think, allowing "content" to be stored separately and referenced by hash is ok.
|
Strike that, it's handled by having all commits on the
Well, I guess we totally agree about the fact that this should be the minimal viable spec, but I'm not sure whether other fields should be used or not in determining commit trustworthiness. I guess it's a matter for #9, though :)
This sounds like a good idea iff most tools can be done without looking at anything else than a single layer. IOW, if concerns are actually decoupled. Also, I fear a bit this direction: then, it's only a step away from multipart/mixed, and I feel like that's something we should try to avoid if we don't want the thing to become a behemoth of complexity. I'd be more comfortable saying people who really want to store additional information should do so in the commit message / x-* headers and have their tooling on the side. @lrvick TBH I'm not sure about the issue of one-line vs. multiple-lines. The advantage of one-line JSON is that it can be parsed and verified as is with a dependency on a JSON parser. But having multiple lines isn't that big a deal: we can just base64 it all and put it all on one line. With that in mind, I don't really see a reason not to have the content in the same object as the signature: it'd just make things less nice to debug. |
Whoops, comment went out too fast. First remark now reads:
|
But having multiple lines isn't that big a deal: we can just base64 it all and put it all on one line.
I'm also for keeping the `git-signatures`'s current approach here. IMHO no need to invent a new signature format when we can just reuse OpenPGP and base64 encode its packets.
If we really want we can also skip the base64 step as OpenPGP packets have clear boundaries and those can be easily parsed with sane languages, but I'm all for keeping it as it makes a simple `bash` implementations viable too.
multipart MIME
I agree that "multipart" part is not the direction we should be going, but the `content-type` field itself seems pretty useful, IMHO.
My main problem with the whole issue is that by adding each new field on the top level we add assumptions and restrictions, and I want "tools, not restrictions".
E.g. I imagine some projects might want both `diff-` and `context-understanding` to be a single attribute. If `git-appraise` and `crev` can't agree on anything in their data models who are we to force them? I imagine some might even disagree with `!-0+` of the `result*` fields (which kinda depresses me).
E.g. I myself now think that maybe we should also have an inverse to `!` to mark security fixes and `?` to explicitly say "should be reviewed by someone else" (aka review request in `git-appraise`). I.e. something like
```
"result":
- "!" = "has a security issue!",
- "-" = "I disapprove",
- "0" = "FYI",
- "?" = "No idea/please review",
- "+" = "I approve",
- "=" = "has a fix for a security issue",
```
Thinking some more about this, we might also want to have "FYI" reviews to be their own separate thing. I.e.
```
"result":
- "!" = "has a security issue!",
- "-" = "I disapprove",
- "?" = "No idea/please review",
- "+" = "I approve",
- "=" = "has a fix for a security issue".
```
and "FYI" reviews would then have no `result*` fields at all and would be completely ignored by `verify` operation but still shown by `show`. I.e. you could then use them as usual comments (storing them in a separate branch with relaxed constraints might be good then too).
I have several more similar ideas.
So the question is, where do we stop here?
This explains my slowness towards approving #10. I'm not even convinced myself we should allow `x-` fields in the first place. Maybe we should set the very minimal format (including using only a single bit of information "vulnerable/maybe not" in `result` fields) in stone and put everything else into `content-type` extensions. Maybe not. I spent a week thinking about it I'm still not convinced either way.
|
Same here, especially as it leaves the door open to a sorting strategy like
Well, that's true… but at the same time we do need to standardize on something if we want tools to be interoperable. So at least the
For instance, this problem can be solved by just having the tool present a single option and writing equal values to
For security fixes, this can be found by listing the For However, the
That… would sound just like git-notes, actually. Not sure we actually have a need for this, IMO we're going too far in the “encompass all the possible use cases” here, and we'll end up rewriting git if we continue this direction :) My vision of our need is a system for signing code review of diffs in a context in a way for which verification can be automated. In this context, security fix handling should be considered only insofar as it helps automated verification of code review. Additional metadata can be considered as additional metadata, that need not be in the standard -- and particularly not in the tl;dr: I think we need at least |
tl;dr: I think we need at least `!-0+` for `review{,-otherwise}`, and `with` for commits. I think all the rest could go into implementation-defined extensions indeed (which could be `content-type`, `x-*` fields, etc.)
I see. I'm leaning towards `with`, `result*`, `content-type` fields, and all unknown fields being errors. This has an advantage of allowing structured formats like JSON in `content-type` if and when you need them.
However, this raises two issues:
- in #9 (comment) you write
Also, I'm not sure what the point of the `low` settings to `thoroughness`, `context-understanding` and `diff-understanding` is,
then, if not taking them into account for review checking: if I set the three to `low`, then I'd hope people trust (by default) my
review less than when I set all of them to `high`, because otherwise it means I'll just not sign anything with `low` understanding.
which also makes sense.
But maybe those extensions should then be processed via hooks that would classify reviews into categories which you would then process like you propose in #9 (comment). E.g. you would then assign "+" without high-enough `thoroughness` to a lowest category or something. Or you could discount "+"es by authors that seem to miss many issues in their reviews. How does that sound?
- The idea behind "?" still seems useful to me: a kind of review type for getting attention. I agree that maybe it shouldn't be a `result`, but if not a result, then what? How about something like
```
commit ... diff
...
result !
content-type: review
subsystem: nixos/ssh
understanding: high
thoroughness: low
review-request: @<ssh service maintainer>
This seems very bad to me in such and such use case. @<ssh service maintainer> should take a look ASAP.
```
```
commit ...
result +
content-type: review
subsystem: pkgs/<package>
understanding: medium
thoroughness: medium
review-request: @<package maintainer>
LGTM, evaluates ok, but this is not a trivial change and it would be nice if @<package maintainer> would also take a look.
```
```
commit ...
result +
content-type: review
subsystem: nixos/new-service
review-request: * # to anyone willing
Please review this new service!
```
?
|
That makes sense :) overall, I think these can be tool-specific features, and I more and more think that if a way to define policy should be given by the RFC for projects to give their policies, tools should be able to add in tool-specific features that take in addition some additional data in the reviews.
Your examples LGTM… except that I do fear |
Hi Lance, it's Dawid! @lrvick . I'm working on something different, but similar: cargo-crev - which is a tool for reviewing Rust/crates.io packages, and a part of broader ecosystem of code-trust ideas ( So I've been independently creating own format for Regarding the cosmetic differences:
I have very shallow understanding of scope and goals here, so please excuse if I get anything wrong, but some differences from my impression so far (the goals of projects might be different, so I'm just staying them, without much opinion):
I have a big chunk of functionality ready for |
This spec is not really specific to git. It does assume some kind of content-addressable storage (and a Merkle tree for sequences of commits if we are to ignore alternative anchoring modes of #3) but nothing beyond that.
From my point of view, the main issue here is making the spec robust against evil participants and servers without requiring too much review work from the honest participants.
|
I experimented with the following format today as a single json line incorperating ideas from @oxij with mine. Note this example also assumes potential use cases like outlined in #2
Will update the following example as we reach agreement on format (or any "Git romans" style alternative format we want to compare against)
In case it is not obvious, the "sig" value is valid against a one line representation of anything in the "body" object.
This example is also intended to represent all options we might want to support so we can decide on a sane future-proof format. Any options here are not implied to be included in a v1 spec.
The text was updated successfully, but these errors were encountered: