Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End-to-end integrity of crates in registries #4768

Open
withoutboats opened this issue Dec 1, 2017 · 50 comments
Open

End-to-end integrity of crates in registries #4768

withoutboats opened this issue Dec 1, 2017 · 50 comments
Labels
A-registries Area: registries A-security Area: security S-triage Status: This issue is waiting on initial triage.

Comments

@withoutboats
Copy link
Contributor

withoutboats commented Dec 1, 2017

I have been working on a proposal which would add a feature to cargo to make registries trust free - that is, to verify the integrity and authenticity of crates downloaded from a registry like crates.io even if crates.io were compromised.

I don't have an RFC yet, but I have a sketch & I thought I'd post it here. In brief, we automatically sign crates on publication and verify the signature on download.

First, we need a concept of a source of truth about user identities to cargo. For crates.io, the current source of truth is GitHub (crates.io does not manage its own authentication). Such a source of truth needs to have a feature which allows users to publish a public key that can be used to verify their signature. GitHub today allows users to publish GPG keys to verify their commits (though we don't need to actually use GPG, just format our public keys according to the OpenPGP spec - I wrote a crate to do this already for ed25519 keys).

We adjust the cargo login flow to generate a new key pair and publish the public key to GitHub - this happens entirely client side, crates.io never gains permission to publish GPG keys. The private key is saved in the user's .cargo directory, and used to sign crates published from this machine. Key revokation is done by deleting the key from GitHub.

In the registry, we additionally track the signature and the means to identify the public key (in this case, the info needed to request it from GitHub). When downloading crates, we verify the signature in the registry. We display to the user (published by GitHub user @withoutboats) or something similar, providing them the guarantee that this user did actually publish this data, and the data has not been modified since.

Critically, the registry plays no role in distributing public keys. Though I've so far discussed this in terms of distributing them through GitHub, the system can be flexible to allow additional sources of identity in the future, which would display different information.

Why not TUF?

  1. TUF was not designed with an external authority about identity in mind, and so it involves the registry managing the ultimate source of truth for identities (the master key). We already delegate responsibility for managing user identity to a third party (GitHub).
  2. TUF is opt-in, and has a high cost for opting in. Most users would probably not opt in.

In other words, adopting TUF would not provide much security for most users. All security would still depend on our security practices (in managing the master key), and only a small handful of crates would even see the gains.

The high opt-in cost has its own advantages: only users who are serious about key management would be likely to do it. However, users already have to trust the security practices of the authors of crates they depend on, and there's not really a way to avoid that.

Future extensions

Beyond the basic proposal I've sketched out, there are many ways the security of this system could be improved further:

  1. We could adopt threshold signatures from TUF for crates with multiple owners, providing a higher degree of security (this would require a significant UX revamp).
  2. We could adopt a trust-on-first-use system for ownership, to help catch e.g. withoutboots publishing a crate that actually belongs to withoutboats (there are some serious challenges I haven't worked out here about backwards compatibility and our github teams feature).
  3. We could add options to require signatures and refuse to build crates without them, influencing resolution.

cc @rust-lang/cargo.

@withoutboats
Copy link
Contributor Author

Also re TUF integration: in theory, an implementation of TUF could become an alternative "source of identity" eventually, as an alternative to using your GitHub account.

@dperny
Copy link

dperny commented Dec 1, 2017

One of the problems that TUF solves is the issue of a malicious attacker distributing valid but out of date packages. Even an out of date package has a valid signature, and an attacker can present an old, insecure package to you as if its the latest version. If you're not cross checking and have no external idea of what the latest package version is, you could end up using insecure software.

I guess cross-checking against Github solves the problem? But then you need a more complex integration with Github's API, to determine what the latest stable version is. WDYT?

@withoutboats
Copy link
Contributor Author

withoutboats commented Dec 1, 2017

@dperny cargo uses the registry index to track what the most recent version of a package is, though there's no cryptographic verification of the index right now.

@withoutboats
Copy link
Contributor Author

though there's no cryptographic verification of the index right now.

Its occurred to me that we should also start signing commits to the index. Here's a sketch:

  • We add to the index format's config.json a list of committer keys.
  • Every time we pull, we verify that every commit was signed by one of the keys.

@est31
Copy link
Member

est31 commented Dec 1, 2017

One of the problems that TUF solves is the issue of a malicious attacker distributing valid but out of date packages. Even an out of date package has a valid signature, and an attacker can present an old, insecure package to you as if its the latest version.

Its occurred to me that we should also start signing commits to the index

An attacker who has gotten access to the github repo where the index is stored can just force push to an older version of the index. This can be fixed however by looking at the date of the commit and requiring that it is at at most X old.

In general, getting some way of index signing onto the way is a very good idea, especially as it allows for third party mirrors outside of github. Your suggestion doesn't give much additional security for that though as it puts the list of keys into the config.json. It is probably easier to handle than baking the keys into cargo and then wanting to change it in the future when old cargo versions are still around :). DNSSEC has done that and got that precise problem :).

To the main point of the actual signature of the .crate files. What about having the users sign the json files of the registry? This would have several advantages:

  • Suppose user x has gotten a new key and wants to re-sign all older versions of their crate. If the crate has some kind of an integrated signature, the user would have to download every single version of their crate, sign it, then re-upload it. That is against the immutable-by-design nature of crates.io. It would also break the sha256 hashes in all Cargo.lock files, and to viewers of the index it would look like some malicious edit.
  • Transparency. Every change would be public and accountably visible on the public git index. If you mirror crates.io you would immediately be able to do the change yourself.
  • It would verify the fact that some version of a crate has been yanked/not yanked with the key.

Also at this point it might be a smart moment to suggest a change in the way check sum creation works for .crate files: one issue I have observed with the way we are doing it right now is that you can't improve compression of the .crate files without changing the check sum, even though you don't touch any of the content. Also it is hard to have storage backends that do deduplication in order to reduce size. Therefore I suggest that check sums are not on the bare .crate files but instead on files inside those crate files that contain a hash of every file in the file. This suggestion is a bit orthogonal to what you probably target here @withoutboats but it would be cool if this could somehow be coordinated. I can also open a separate RFC if you want.

@withoutboats
Copy link
Contributor Author

Your suggestion doesn't give much additional security for that though as it puts the list of keys into the config.json.

But a commit that would edit the config.json would only be accepted by end user cargo if it were signed by a key already in the config.json before the edit.

To the main point of the actual signature of the .crate files. What about having the users sign the json files of the registry? This would have several advantages:

I'm a little uncertain what you mean here. My intent is that the user signs the .crate file but the signature is stored in the index - that is, we're talking about using a detached signature.

To be more concrete, in my sketch, the objects in the index would gain these additional fields:

{
    "authority": "github-pgp",
    "publishkey": "withoutboats/1CC70310BE3912D5",
    "signature": "/* a hex encoded ed25519 signature */"
}

But the data signed to generate the signature would be the contents of the .crate file.

@est31
Copy link
Member

est31 commented Dec 1, 2017

that is, we're talking about using a detached signature.

I see. Everything is fine then!

@matthieu-m
Copy link

Could you please clarify how revocation is supposed to work?

At the moment, I have the impression that a famous user (such as @BurntSushi) revoking or rotating their keys would cause the same issue as NodeJS' infamous left-pad pull: if it becomes impossible to verify their crates, then it's exactly the same as their crate not existing any longer.

Or worse, users get used to ignoring "untrusted" warnings because revocations/rotations are so frequent that crates months old are always untrusted anyway.

@withoutboats
Copy link
Contributor Author

withoutboats commented Dec 1, 2017

@matthieu-m the implication of what you're saying is quite right - once a key is revoked, we can't hard fail on those crates or the ecosystem will be in ruins.

However, I'll note there are also a huge number of crates which are unsigned already, and those have to keep building. We cannot have "require valid signatures" as the default setting.

If the public key is revoked, the crate just does not display that it was published by any particular users. End users will have to make a choice about what to do in that situation. Someday we will probably support a flag to fail in that situation.

However, revocations should not be particularly frequent - only when a user believes their private key has been compromised. You can rotate keys without revoking the old one by deleting your local private key without deleting the public key. Users may be identified by any number of public keys published to their GitHub accounts, not one particular key.

(However, if there is a valid public key and a valid signature and the signature isn't valid, we should hard fail because that is clear evidence of tampering.)

@raggi
Copy link

raggi commented Dec 2, 2017

TUF was not designed with an external authority about identity in mind, and so it involves the registry managing the ultimate source of truth for identities (the master key). We already delegate responsibility for managing user identity to a third party (GitHub).

You could still do this with a TUF root. When a user publishes a new version of a crate, if the users signing key changed, you would check against the users github key, and if the key is new (but the user auth is successful), then you would revoke the old key, and publish with the new kew. Working revocation model, and still delegated auth.

TUF is opt-in, and has a high cost for opting in. Most users would probably not opt in.

Again, this is a design choice for your implementation, and it needn't be. What if the process for setting up TUF when you publish was to push a TUF key to your github account somewhere. Done. The receive side, after this is setup, should not be opt-in at all. You can add a special marker in the targets that is explicit about the entire set of unsigned targets, and disallow any future unsigned targets.

In other words, adopting TUF would not provide much security for most users. All security would still depend on our security practices (in managing the master key), and only a small handful of crates would even see the gains.
The high opt-in cost has its own advantages: only users who are serious about key management would be likely to do it. However, users already have to trust the security practices of the authors of crates they depend on, and there's not really a way to avoid that.

You can make users handling of keys easy. if you kdf & secretbox them and store them in a user gist or user repository then they will be stored in the cloud, and be encrypted with a strong scheme. They may still be compromised, however, this is no worse than a users laptop or github account being compromised today. There would be a huge benefit though - if you have TUF you have the ability to revoke the keys at a point in history and subsequently heal the user ecosystem by explicitly notifying folks they've fetched potentially unauthorized blobs. This is exactly what TUF is for.

@matthieu-m
Copy link

I've been thinking further about the issue of key revocation/rotation, and wondering whether instead of tying the verification to each and every author (a large multitude) it would not be simpler to place one's trust in a handful of trusted entities.

I imagine a lightweight cargo Trust Server whose only role is to interact with a github (or otherwise) repository in which each crate's manifest is signed by this Trust Server own private key.

These Trust Servers would regularly scan existing registries (such as crates.io) and check that the newly published crates have been signed by their(s) author(s). Signed crates manifests are then signed by the Trust Server itself and committed to github.

A simple cargo command lets one add a Trust Server to cargo (it could even come pre-bundled with a known list). When downloading a crate from a registry, cargo can then check the associated public repositories of those servers for the signed manifest of the crate, and require a certain quorum be met.


This is slightly more complicated, of course, however it has substantial advantages:

  • much fewer dependencies: there are hundreds/thousands/... crate authors, keys are bound to get revoked, and then what?
  • redundant dependencies: if one Trust Server disappears, it doesn't matter,
  • built-in key rotation: it's easy to add code to the Trust Server to periodically re-sign manifests with a new key, and drop the old ones.

Also, in case a rogue crate appears on registries, the fewer number of Trust Servers makes it still possible to contact their administrators and ask them to unlist the crate in question.

And one day, when we get reproducible builds, we could imagine Advanced Trust Servers rebuilding the crates' binaries and publishing the binaries signatures on their own, providing independent verification that they match the sources.

@tarcieri
Copy link

tarcieri commented Dec 2, 2017

Key revokation is done by deleting the key from GitHub.
When downloading crates, we verify the signature in the registry. We display to the user (published by GitHub user @withoutboats) or something similar, providing them the guarantee that this user did actually publish this data, and the data has not been modified since.

What happens when a user removes a key they used to publish past packages? Do these packages now fail to verify?

How do you handle all of the crates which were never signed in the first place?

This approach seemingly provides AuthN, but not AuthZ: namely, it allows us to check a package has been published by a particular user, but does nothing in terms of providing end-to-end policies for which users are allowed to publish which packages.

To me this is really the power of TUF: packages are claimed by a trusted set of publishers, and after being claimed have end-to-end security from then on.

Its occurred to me that we should also start signing commits to the index. Here's a sketch: We add to the index format's config.json a list of committer keys. Every time we pull, we verify that every commit was signed by one of the keys.

This is really starting to sound a lot like TUF.

(Edit: I do think the index could be potentially leveraged to provide a more lightweight alternative to TUF, but I think doing that well would involve leaning on git for cryptographic properties, which is probably not a wise thing to do until its hash function is upgraded from SHA-1)

@joshtriplett
Copy link
Member

I don't think it makes sense to check developer signatures in multiple places. Instead, use a trust model similar to that of Debian: check the developer signature on upload, then sign the index commit with a crates.io key.

@withoutboats
Copy link
Contributor Author

This feedback seems to misunderstand that the goal of this proposal is to give users assurances without requiring them to trust crates.io. Anything that requires us to trust the security of a private key controlled by crates.io does not meet the goal.

This approach seemingly provides AuthN, but not AuthZ: namely, it allows us to check a package has been published by a particular user, but does nothing in terms of providing end-to-end policies for which users are allowed to publish which packages.

This is accurate but there's no way to provide AuthZ without trusting crates.io services.

@tarcieri
Copy link

tarcieri commented Dec 2, 2017

This is accurate but there's no way to provide AuthZ without trusting crates.io services.

A system which provides AuthN but does not provide AuthZ does not provide useful security properties. You wind up with integrity with no guarantee of authenticity.

@raggi
Copy link

raggi commented Dec 2, 2017

@withoutboats in your proposal you say:

In the registry, we additionally track the signature and the means to identify the public key (in this case, the info needed to request it from GitHub). When downloading crates, we verify the signature in the registry. We display to the user (published by GitHub user @withoutboats) or something similar, providing them the guarantee that this user did actually publish this data, and the data has not been modified since.

This is inherently trusting crates.io. The ecosystem relies on trusting crates.io. You can add features that appear to not put any crypto on crates.io in an attempt to suggest that the trust isn't centralized there, but ultimately you can't actually escape it. What works better than this is to have at least two trusted sources, the original publisher, and the distribution host. This is another thing that's been well thought through in TUF, and it provides that. Much of what's been discussed here as alternatives, as they become more concrete, are just partial reimplementations of the concerns already in the paper. I'd encourage you to check the threat models again, and apply them to your proposals, and also consider that one of the more important use cases is "fetching code that's never been fetched before, from the central registry".

@withoutboats
Copy link
Contributor Author

withoutboats commented Dec 3, 2017

You wind up with integrity with no guarantee of authenticity.

The authenticity is that this data was published by a particular GitHub user, which is displayed to the end user after they fetch the crate. Attacks in which someone successfully publishes a crate they are not the owner of are still possible, but become very visible. Attacks in which someone modifies data published by another user, or otherwise impersonates that user, are not possible.

I'd like it if we could adopt a TOFU model for ownership, but this conflicts with some existing features of the registry (most significantly that you can make a GitHub team an owner of a crate) and is not backwards compatible.

This is inherently trusting crates.io.

No it isn't. The public key is published to a GitHub user's account by that user. You do not need to trust crates.io to verify that the data was published by that GitHub user and has not been modified since they published it.

@raggi
Copy link

raggi commented Dec 3, 2017

No it isn't. The public key is published to a GitHub user's account by that user. You do not need to trust crates.io to verify that the data was published by that GitHub user and has not been modified since they published it.

If crates.io gives me package "foo" made by "Dr. Evil" and I use it, it's because I trust that crates.io says "Dr. Evil" is authorative about "foo". It doesn't matter where the storage is, the root of trust for installation operations (the primary use case) is at crates.io. Where the material lives or how authentication happens to update it won't alter that.

@withoutboats
Copy link
Contributor Author

If crates.io gives me package "foo" made by "Dr. Evil" and I use it, it's because I trust that crates.io says "Dr. Evil" is authorative about "foo".

Not in many cases. There are quite a few crates that I would be immediately suspicious of if they were not published by one of an enumerable set of GitHub accounts, based on what I know about those crates totally outside of crates.io.

The case is that today cargo does not tell you who published a crate. We could enable cargo to provide the users this information giving them integrity and authenticity without trusting crates.io, while still trusting crates.io to control authorization.

@matthieu-m
Copy link

I think @raggi raises an important point indeed.

There is nothing preventing a rogue crates.io administrator from switching the regex crate and signing it with the BurntЅushi github account (1). cargo would see no issue, as said rogue administrator does control the account and has all the appropriate keys and signatures in place.

I don't see many ways to prevent a new crate from being hi-jacked (there's no prior information to rely on), but this is not as concerning as such a crate would likely have little impact on the community. On the other hand, it seems important to be able to ensure that widely used crates cannot be hi-jacked, given the impact this could have.

I think preventing someone from "rewriting the history" would be a good first step, as it enables warning the user when the publisher account changes, which is suspicious.

Maybe cargo should keep the "N" last commits to the index when it connects to the registry, so as to be able to check that nobody rewrote it since the last connection?

(1) Look closer, that S comes from the Cyrillic script.

@withoutboats
Copy link
Contributor Author

withoutboats commented Dec 4, 2017

There is nothing preventing a rogue crates.io administrator from switching the regex crate and signing it with the BurntЅushi github account (1). cargo would see no issue, as said rogue administrator does control the account and has all the appropriate keys and signatures in place.

And nothing more would prevent a rogue administrator from adding BurntЅushi (Cyrillic) as an owner to the regex crate through whatever "more secure" means we establish - that administrator will have access to the service and the private keys that we use to sign over authority of ripgrep to BurntSushi (the real BurntSushi). Any authorization system within our current resources has the same problem.

However, there actually is something that would prevent someone from adding BurntЅushi (Cyrillic) as an owner to ripgrep - GitHub does not allow users to register accounts with names like BurntЅushi (Cyrillic). More generally, I am more comfortable trusting GitHub than crates.io, because GitHub has a paid security team and crates.io does not.

Maybe cargo should keep the "N" last commits to the index when it connects to the registry, so as to be able to check that nobody rewrote it since the last connection?

If cargo doesn't pull in a --ff-only mode already that seems like a lowhanging bug.

@tarcieri
Copy link

tarcieri commented Dec 4, 2017

@withoutboats the reason TUF is as complicated as it is is because it has to deal with hard problems. You were talking about some of them here:

@matthieu-m the implication of what you're saying is quite right - once a key is revoked, we can't hard fail on those crates or the ecosystem will be in ruins.

However, I'll note there are also a huge number of crates which are unsigned already, and those have to keep building. We cannot have "require valid signatures" as the default setting.

If the public key is revoked, the crate just does not display that it was published by any particular users. End users will have to make a choice about what to do in that situation. Someday we will probably support a flag to fail in that situation.

However, revocations should not be particularly frequent - only when a user believes their private key has been compromised. You can rotate keys without revoking the old one by deleting your local private key without deleting the public key. Users may be identified by any number of public keys published to their GitHub accounts, not one particular key.

(However, if there is a valid public key and a valid signature and the signature isn't valid, we should hard fail because that is clear evidence of tampering.)

You also mentioned:

Its occurred to me that we should also start signing commits to the index. Here's a sketch: We add to the index format's config.json a list of committer keys. Every time we pull, we verify that every commit was signed by one of the keys.

I think the place to start in specifying a system like this is a set of requirements for what you want the system to do, prior to attempting to specify a design. So far, based on your statements, I think we have:

  • Need a solution for signing the backlog of already published crates
  • Historically published crates shouldn't fail to verify because a publisher routinely rotated a key
  • Crates MUST fail to verify if a publisher actively revokes a key due to a security incident
  • The package index should be signed

I'd like to note TUF was explicitly designed to handle the cases above, and more, specifically solving the package AuthZ problem using keys held by package publishers.

All that said, I would very strongly suggest putting together a requirements document for what you hope for a crates.io package security system to achieve, and would particularly suggest writing up a threat model that includes adversaries, failure modes, and specific requirements for what a secure system looks like in terms of defending against those failure modes.

@withoutboats
Copy link
Contributor Author

withoutboats commented Dec 4, 2017

@tarcieri I think we're miscommunicating pretty seriously, since none of the requirements you enumerated are goals of this particular project. In fact, I think the third is actively harmful unless users explicitly and knowingly opt into that feature.

The comments you quote are largely side concerns. I am not really interested in improving the authorization system of crates.io at the moment.

@tarcieri
Copy link

tarcieri commented Dec 4, 2017

Let me put that another way then: if the system doesn't meet any of those requirements, it provides no security value whatsoever.

@withoutboats
Copy link
Contributor Author

@tarcieri You do not believe that knowing that the crate you are building was published by someone with access to a particular GitHub account would be of security value to you?

@tarcieri
Copy link

tarcieri commented Dec 5, 2017

@withoutboats here's a hypothetical example of the sort of output I think your system is proposing:

$ cargo build
   Compiling cfg-if v0.1.2 (signed by @alexcrichton)
   Compiling bitflags v0.9.1 (signed by @alexcrichton)
   Compiling protobuf v1.4.1 (signed by @stephancheg)
   Compiling byteorder v1.0.0 (signed by @BurntSushi)
   Compiling untrusted v0.5.0 (signed by @briansmith)
   Compiling pkg-config v0.3.9 (signed by @alexcrichton)
   Compiling scopeguard v0.3.2 (signed by @bluss)
   Compiling vec_map v0.8.0 (signed by @Gankro)
   Compiling futures v0.1.14 (signed by @alexcrichton)
   Compiling rustc-demangle v0.1.4 (signed by @alexcrichton)
   Compiling strsim v0.6.0 (WARNING: Unsigned!)
   Compiling rayon-core v1.2.1 (signed by @cuviper)
   Compiling gcc v0.3.51 (signed by @alexcrichton)
   Compiling bitflags v0.7.0 (signed by @alexcrichton)
   Compiling unicode-segmentation v1.1.0 (signed by @alexcrichton)
   Compiling data-encoding v2.1.0 (signed by @ia0)
   Compiling data-encoding v1.2.0 (WARNING: Unsigned!)
   Compiling unicode-normalization v0.1.5 (signed by @alexcrichton)
   Compiling ansi_term v0.9.0 (signed by @ogham)
   Compiling libc v0.2.24 (signed by @alexcrichton)
   Compiling lazy_static v0.2.8 (signed by @Kimundi)
   Compiling unicode-width v0.1.4 (signed by @alexcrichton)
   Compiling num-traits v0.1.39 (signed by @cuviper)
   Compiling either v1.1.0 (signed by @bluss)
   Compiling backtrace v0.3.2 (signed by @alexcrichton)
   Compiling rand v0.3.15 (signed by @alexcrichton)
   Compiling time v0.1.37 (signed by @alexcrichton)
   Compiling term_size v0.3.0 (signed by @kbknap)
   Compiling atty v0.2.2 (signed by @softprops)
   Compiling termios v0.2.2 (signed by @dkuddebak)
   Compiling num_cpus v1.6.2 (signed by @seanmonstar)
   Compiling lmdb-sys v0.6.0 (signed by @danburkert)
   Compiling textwrap v0.6.0 (signed by @mgeister)
   Compiling coco v0.1.1 (signed by @stjepang)
   Compiling rpassword v0.4.0 (signed by @conradkdotcom)
   Compiling error-chain v0.10.0 (signed by @Yamakaky)
   Compiling num-integer v0.1.34 (signed by @cuviper)
   Compiling num-iter v0.1.33 (signed by @cuviper)
   Compiling clap v2.25.0 (signed by @kbknapp)
   Compiling num v0.1.39 (signed by @hauleth)
   Compiling chrono v0.4.0 (signed by @lifthrassir)
   Compiling lmdb v0.6.0 (signed by @dburkert)
   Compiling rayon v0.7.1 (signed by @cuviper)
   Compiling ring v0.11.0 (signed by @briansmith)
   Compiling objecthash v0.4.1 (signed by @tarcieri)
   Compiling ring-pwhash v0.11.0 (signed by @tarcieri)

I have intentionally (and perhaps unintentionally) put some incorrect GitHub usernames in here. Can you spot them all?

Can anyone? Especially every time they run the tool.

I don't see how this sort of thing provides any security value without an authorization system which maps authors to the packages they're authorized to sign.

@withoutboats
Copy link
Contributor Author

withoutboats commented Dec 5, 2017

I don't see how this sort of thing provides any security value without an authorization system which maps authors to the packages they're authorized to sign.

But we do have this today. I don't know of a way to publish a crate that I don't own. I'd be interested in improving the security of our authorization system, but i have two rejoinders:

  • It's orthogonal to this proposal.
  • It's limited by our operational resources, which are quite small.

(FWIW I did recognize kbnap and dburkert in a once over, though I suspect there are more that I missed. EDIT: and stephancheg).

@withoutboats
Copy link
Contributor Author

I think a more constructive framing would be a question of how to allocate development priorities? For example, this project is possibly lower priority than signing the commits in the index.

@tarcieri
Copy link

tarcieri commented Dec 5, 2017

But we do have this today. I don't know of a way to publish a crate that I don't own.

You are trusting that crates.io only allows authorized publishers to publish packages.

Providing the identity of signers only shows that a given crate was signed by any GitHub user, and not someone who was authorized to sign it.

If your goal is to "give users assurances without requiring them to trust crates.io", the best this proposal accomplishes is "signed by someone on GitHub, and perhaps given your knowledge of the Rust community you might be able to spot if it's someone suspicious". That's provided the package is even signed, and that the signature is verifying.

Also, based on this:

We cannot have "require valid signatures" as the default setting.

...you intend for signature checking to fail open. So the output might realistically look a bit more like this:

   Compiling data-encoding v1.2.0 (WARNING: Unsigned!)
   Compiling unicode-normalization v0.1.5 (signed by @alexcrichton)
   Compiling ansi_term v0.9.0 (ERROR! Failing signature from @ogham)
   Compiling libc v0.2.24 (signed by @alexcrichton)
   Compiling lazy_static v0.2.8 (ERROR! Failing signature from @Kimundi)
   Compiling unicode-width v0.1.4 (signed by @alexcrichton)
   Compiling num-traits v0.1.39 (WARNING: Unsigned!)
   Compiling either v1.1.0 (ERROR! Failing signature from @bluss)
   Compiling backtrace v0.3.2 (signed by @alexcrichton)
   Compiling rand v0.3.15 (signed by @alexcrichton)
   Compiling time v0.1.37 (signed by @alexcrichton)
   Compiling term_size v0.3.0 (WARNING: Unsigned!)
   Compiling atty v0.2.2 (WARNING: Unsigned!)

I'm not sure a system like this is helpful. I think we need:

  • to ensure the correct users are signing packages
  • ensure all packages are signed by something, even retroactively
  • that the system fails closed if there's a signature verification failure
  • that mundane key rotation events do not cause signature failures

@tarcieri
Copy link

tarcieri commented Dec 5, 2017

I think a more constructive framing would be a question of how to allocate development priorities? For example, this project is possibly lower priority than signing the commits in the index.

I apologize if I've come off too negative. I think leveraging GitHub as an identity provider for digital signature keys is generally a neat idea. But there are a ton of details to be worked out, and I want to ensure all of those are addressed.

@Kixunil
Copy link

Kixunil commented Dec 5, 2017

@matthieu-m I'd say enforce ASCII-only names.

preventing someone from "rewriting the history"

Sounds like blockchain... :)

@tarcieri
Copy link

tarcieri commented Dec 5, 2017

@Kixunil the neat thing is since the crates.io index is a git repository, you can get the append-only properties of a blockchain out of it. Now if only the hashing weren't all SHA-1.

I think it'd still be neat to prototype security features on top of the crates.io index, and when GitHub at last supports git-with-secure-hashing (e.g. SHA-256), launch a parallel crates.io index which uses the new hash format, and does things like signing each commit to the index.

@ebkalderon
Copy link

ebkalderon commented Jun 14, 2018

Just wondering, is there currently any kind of rough consensus on whether to adopt this or rust-lang/crates.io#75 as the security model? I see that both issues are open at the same time. Has there been any new discussion since then?

@tarcieri
Copy link

tarcieri commented Jun 14, 2018

@ebkalderon I asked some of the developers of TUF (i.e. rust-lang/crates.io#75) about combining these two approaches: using TUF for AuthZ, but delegating a mapping of users/principals to public keys (i.e. AuthN) to, in this case, GitHub. Unfortunately TUF isn't designed to work that way, so moving forward with this approach would effectively take TUF off the table.

Personally I'm OK with that, although I think there are ideas around how to do AuthZ where we could borrow from TUF.

@tarcieri
Copy link

tarcieri commented Jul 18, 2018

The more I've thought about this, the more I think the two approaches can be made compatible. I'm not quite to the point I'm ready to write a full pre-RFC, but I think much of TUF can be leveraged in conjunction with this approach (or in other words, I think what I said in my last post was wrong).

TUF delegations work entirely in terms of cryptographic keys as the sole source of identity. A "targets" role decides the mapping between things like crates and the cryptographic keys used to sign them.

Where this "targets" role gets these keys from and how it makes decisions about which keys are authorized to sign which packages (i.e. "delegated targets" in TUF terminology) is largely left as an exercise to the system implementing TUF. In the case of a system like this, it could just fetch them from GitHub, but in doing so can provide point-in-time snapshots of the keys along with a historical record of what they were at the time various packages were published.

The places where TUF overlaps with a system like the one being proposed in this issue, I think we could just ignore the TUF-specific bits and lean on e.g. git commit signing. This means after we have a basic package signing system in place, TUF can be layered on at a later date to provide end-to-end cryptographic authorization and package integrity.

@JustinCappos
Copy link

We are happy to help in any way we can. Let us know if you'd like us to review the design or to participate in a session to flesh out the approach .

@gregokent
Copy link

Sorry if this is out of place, but is this something that uptane.github.io would be more in line with than vanilla TUF? Uptane started as TUF but tailored for the automotive industry, where suppliers create the pieces that an OEM puts together in different ways for different vehicles, albeit with additional pieces for use with capability-limited electronic control units (ECUs).

The part that stuck out as possibly relevent is that a supplier may opt-in to sign their own images and provide all the associated metadata, or if the supplier does not sign its own images, the OEM may do it on behalf of the supplier and there is a director repository that manages that.

From section 8.2 of the Design Overview document

First, we add the director role to the repository (see Figure 8.2a). Adding this role can allow the OEM to completely control the choice of images downloaded and installed by vehicles. The director role functions much like a traditional package manager, except that it operates on the server side instead of the client. It can be used to instantly blacklist faulty versions of software

In the first example, the targets role delegates to role A all images produced by supplier A. Although supplier A builds these images and uploads them to the repository, it has opted not to sign these images. Instead, the OEM itself controls role A, and signs these images on behalf of the supplier. In the second example, the targets role delegates to role B all images produced by a supplier B. In this case, the supplier has opted to sign its own images. In the third example, the targets role delegates to role C all images produced by a supplier C. Instead of signing images itself, this role delegates one subset of images to the role D, and another subset to the role E. These roles could correspond to software developers who are currently employed by supplier C, and who are responsible for developing, building, and testing its ECU images. At any time, the role C can change the developers responsible for signing images, without requiring the OEM to update its own delegations.

As an outsider with an admittedly limited view of the problem in whole, I wanted to throw this out there in case it's beneficial in any way. Thanks.

@tarcieri
Copy link

tarcieri commented Feb 1, 2019

The part that stuck out as possibly relevent is that a supplier may opt-in to sign their own images and provide all the associated metadata, or if the supplier does not sign its own images, the OEM may do it on behalf of the supplier and there is a director repository that manages that.

How is this different from TUF delegated targets?

https://github.com/theupdateframework/tuf/blob/develop/docs/papers/protect-community-repositories-nsdi2016.pdf

@rugk
Copy link

rugk commented Mar 13, 2019

Something like this is really the best way to make cargo future-proof. We should have learnt from incidents like those in npm and even hypothetical stories like this that this is very important.

And as the ecosystem of cargo crates is just growing, we can still make signatures and strong verification the default. It is not too late! (just imagine what hurdle you'd have to introduce such a thing in npm)

@cavokz
Copy link
Contributor

cavokz commented Nov 21, 2019

I'm enjoying Rust a lot but I'm frightened by the way the dependencies are brought in with cargo. This discussion is quite interesting but is it now halted or just continuing somewhere else?

@tarcieri
Copy link

@cavokz the decision by the core team was to postpone, but a number of people (including myself) are still interested in seeing it through eventually:

rust-lang/rfcs#2474 (comment)

@rugk
Copy link

rugk commented Jan 3, 2020

FYI it's interesting to see the completely same discussion for other package managers, e.g. composer has a big discussion in composer/composer#4022.

In the end, the most realistic solution is what @paragonie-security from paragon initiative suggest with using chronicle a append-only database (not really blockchain, but similar aims somewhat) using a system called Gossamer for verifying the integrity of updates/packages distributed.

@tarcieri
Copy link

tarcieri commented Jan 3, 2020

As someone who's friends with @paragonie-security and has been a fan of CT/BT-like systems for a decade, and someone who isn't particularly fond of git gryptography, I think the crates.io index already gets the append-only benefits of those sort of systems via git, and the main thing that's missing right now is what's addressed by this proposal: signing the logical "Merkle root" (or thereabouts) of the index.

If crates.io were to move to, say, a static file-based system though, that would change.

@Kixunil
Copy link

Kixunil commented Jan 8, 2020

FYI, there's another interesting project called codechain, which is orthogonal to versioning/code distribution. I guess if there was a way for cargo to download all dependencies without touching them, then an external command could be developed to use codechain to verify the code, if dependencies use it.

@rugk
Copy link

rugk commented Jan 15, 2020

Quite surprised to not see "cargo-crev" (blog post) mentioned in this issue here.
Basically it's aim also is to verify the integrity through a WOT and it also uses signatures for that.

Though the aim may be a little different, certainly related.

@trishankatdatadog
Copy link

FWIW, we're writing a document contrasting the differences between Transparent Logs (i.e., append-only logs) and TUF. Will share when ready!

@ehuss ehuss added the A-registries Area: registries label Apr 5, 2020
@ehuss ehuss added the A-security Area: security label Apr 22, 2020
@joshuagl
Copy link

joshuagl commented Nov 2, 2020

FWIW, we're writing a document contrasting the differences between Transparent Logs (i.e., append-only logs) and TUF. Will share when ready!

That document was posted to the Secure Systems Lab blog here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-registries Area: registries A-security Area: security S-triage Status: This issue is waiting on initial triage.
Projects
None yet
Development

No branches or pull requests