Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version number scalability #69

Closed
mnm678 opened this issue Nov 14, 2019 · 16 comments
Closed

Version number scalability #69

mnm678 opened this issue Nov 14, 2019 · 16 comments
Assignees
Milestone

Comments

@mnm678
Copy link
Collaborator

mnm678 commented Nov 14, 2019

dstufft

Does this have to be an incrementing integer? My concern here is simply that an ever increasing integer can run into scalability problems, particularly if people don't take into account it's ever increasing nature (example, needing a different integer size in a database).

My second question is, do we have to guarantee these are never reused, or can they be reused after some time has passed?

My gut tells me we'd be better off using a uuid for a version number, or generating a random string of characters. The biggest downside I see to that is the extremely remote chance of generating the same version number twice-- but that can be solved by recording all possible versions ever used (which might be good to do anyways to return a 410 instead oof a 404).

@mnm678 mnm678 self-assigned this Nov 14, 2019
@trishankatdatadog trishankatdatadog added this to the Round 5 milestone Nov 14, 2019
@lukpueh
Copy link
Member

lukpueh commented Nov 15, 2019

Some thoughts:

I think for root metadata there is not much choice. It must have a deterministically incrementing version number, in order for the clients to re-trace the chain of root updates (see spec#61). The good news is, the root version is unlikely to increase fast. It only bumps on root expiration (we recommend once a year) and on key compromise.

For all other metadata, version numbers technically are not required. Snapshot can (collisions aside) unambiguously refer to targets or delegated targets by their hashes, same goes for timestamp when referring to snapshot. Timestamp actually already does this, and so does snapshot, albeit optionally (see spec:L718-L719 and spec:L996-L998). The hashes can also be used as metadata filename prefixes in consistent snapshots. And freshliness is provided by timestamp's expiration date and not by version numbers.

What was the original reasoning behind version numbers instead of hashes (or uuids)? To eliminate the chance of collisions? And/or because version numbers are less costly in terms of computation and size? If we generate hashes of the versioned metadata anyway, then the cost-argument does not hold true. And are collisions really a concern here?

It might work to re-use version numbers for non-root metadata, because we also allow to retire old non-root metadata. But we'd have to make sure the we only re-use version numbers from metadata that has been retired, which seems risky. And even then, we should cross-check with the metadata hash.

Related issues: theupdateframework/specification#28 theupdateframework/specification#40.

@trishankatdatadog
Copy link
Collaborator

I talked about this with @dstufft, and he said using strictly increasing version numbers is OK, as long as we have a plan for: (1) integer overflow, when necessary, and (2) super-old clients to update w/o running into accidental DoS "attacks"

@mnm678
Copy link
Collaborator Author

mnm678 commented Nov 18, 2019

As far as the issues @trishankatdatadog mentioned, I think that the current system will work as long as non-root metadata is retired before the integer overflow occurs.

For root metadata, as long as the integer is sufficiently large and only updated once a year, an overflow will not occur for a long time.

For non-root metadata, we can suggest that the metadata is retired before integer reuse becomes a problem. As long as metadata is retired fairly regularly, the integer looping back to 0 should not be a problem. To ensure that this will happen, we can specify that less than 1/2 MAX_INT (for example) versions should exist at any time.

@lukpueh
Copy link
Member

lukpueh commented Nov 21, 2019

As pointed out in #72 (comment) re-using version numbers requires us to go against the spec, which doesn't seem desirable.

But do we really need to define an integer overflow strategy? Can't we just recommend to use a common unsigned 8-Byte BigInt, which is available in Postgres, which IIUC is used by warehouse.

With an upper bound of 2^63-1 we could create a new snapshot (i.e. bump the version) once per millisecond and still last for ~300 million years.

cc @trishankatdatadog, @dstufft, @mnm678, @JustinCappos

@JustinCappos
Copy link

JustinCappos commented Nov 21, 2019 via email

@lukpueh
Copy link
Member

lukpueh commented Nov 21, 2019

What about all the parts of the spec that say a user must not update to a metadata file with a version number less than the one currently trusted (as I pointed out in #72 (comment))?

If we strictly adhere to that we won't be able to update to a reset number, e.g. after revoking a compromised key.

@lukpueh
Copy link
Member

lukpueh commented Nov 21, 2019

Oh. I just saw that the specification has a solution for exactly that. It prescribes to un-trust the previous snapshot and/or timestamp after their keys get rotated.

This means if an attacker bumps from say version 42 to MAXINT, and then the key gets rotated and published with metadata version 43, the client will still be able to update, because the metadata with version MAXINT no longer counts as previously trusted, and the update will be legitimately from 42 to 43.

Is that what you meant by resetting, @JustinCappos?

@JustinCappos
Copy link

JustinCappos commented Nov 21, 2019 via email

@mnm678
Copy link
Collaborator Author

mnm678 commented Nov 21, 2019

The solution in the specification linked by @lukpueh solves this issue for snapshot and timestamp. Can the same method be used for targets (bins and bin-n roles)?

@lukpueh
Copy link
Member

lukpueh commented Nov 22, 2019

If the version number increment of targets and delegated targets is only checked in snapshot (see 3.3.3 of the client workflow), then un-trusting snapshot will be enough to recover from a fast-forward attack on (delegated) targets.

However, 4.3 of the client workflow prescribes that new targets must have a higher version number than previous trusted targets. In that case, recovery from fast-forward is only possible, if old targets are also untrusted after a key rotation, like timestamp and snapshot (see 1.9 of the client workflow).

I think we should update either 4.3 or 1.9 in the spec.

@lukpueh
Copy link
Member

lukpueh commented Nov 22, 2019

At any rate, given that without a compromise versions are not likely to ever reach MAXINT, and even with a compromise tuf can update from MAXINT by discarding the compromised metadata, I don't think we need to re-use version numbers as is described in #72. Does everyone agree?

@JustinCappos
Copy link

JustinCappos commented Nov 22, 2019 via email

@lukpueh
Copy link
Member

lukpueh commented Nov 22, 2019

Good. I can submit a patch to the spec. @mnm678, can you update #72 with the gist of what we discussed here?

@mnm678
Copy link
Collaborator Author

mnm678 commented Nov 22, 2019

Yes, I will update the pr later today.

@lukpueh
Copy link
Member

lukpueh commented Nov 25, 2019

Thanks for the update @mnm678. I left a few comments. I hope they are fair :)

Here's a PR to fix the issue I pointed out above:
--> theupdateframework/specification#65

Feel free to throw rocks at it.

@lukpueh
Copy link
Member

lukpueh commented Nov 26, 2019

Closing this issue via merged #72. Related discussion is continued in theupdateframework/specification#65.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants