Skip to content

Commit

Permalink
Include the design doc in repo
Browse files Browse the repository at this point in the history
* Also add some new diagrams in the design doc
* Fix some issues in ADR

Signed-off-by: Jussi Kukkonen <[email protected]>
  • Loading branch information
Jussi Kukkonen committed Dec 1, 2021
1 parent 845f307 commit 79ae764
Show file tree
Hide file tree
Showing 4 changed files with 216 additions and 14 deletions.
36 changes: 22 additions & 14 deletions docs/adr/0010-repository-library-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@ some issues (as do many other implementations):
* Metadata API makes modifying metadata far easier than legacy code base: this
makes significantly different designs possible
* Not providing a "repository library" (and leaving implementers on their own)
may be a short term solution because of the previous point, but it does seem
like the project would benefit from some shared repository code and shared
repository design
may be a short term solution because of the previous point, but to make
adoption easier and to help adopters create safe implementations the project
would benefit from some shared repository code and a shared repository design
* Maintainability of new library code must be a top concern
* Allowing a wide range of repository implementations (from CLI tools to
minimal in-memory implementations to large scale applications like Warehouse)
Expand Down Expand Up @@ -64,9 +64,7 @@ forward: decision could be re-evaluated in a few months if not in weeks.
### No repository packages

Metadata API makes editing the repository content vastly simpler. There are
already repository implementations built with it (RepositorySimulator in
python-tuf tests is an in-memory implementation, while
repository-editor-for-tuf is an external CLI tool) so clearly a repository
already repository implementations built with it[^1] so clearly a repository
library is not an absolute requirement.

Not providing repository packages in python-tuf does mean that external
Expand All @@ -76,9 +74,10 @@ different designs and hopefully find good ones in the end.

That said, there are some tricky parts of repository maintenance (e.g.
initialization, snapshot update, hashed bin management) that would benefit from
having a canonical implementation. Likewise, a well designed library could make
some repeated actions (e.g. version bumps, expiry updates, signing) much easier
to manage.
having a canonical implementation, both for easier adoption of python-tuf and
as a reference for other implementations. Likewise, a well designed library
could make some repeated actions (e.g. version bumps, expiry updates, signing)
much easier to manage.

### repository_tool -like API

Expand All @@ -97,8 +96,7 @@ being a substantial amount of code that is only a good fit for one application.

python-tuf could define a tiny repository API that
* provides carefully selected core functionality (like core snapshot update)
but...
* does not implement all repository actions itself, instead i makes it easy
* does not implement all repository actions itself, instead it makes it easy
for the application code to do them
* leaves application details to specific implementations (examples of decisions
a library should not always decide: "are targets stored with the repo?",
Expand All @@ -107,7 +105,7 @@ python-tuf could define a tiny repository API that
date?", "which targets versions should be part of new snapshot?")

python-tuf could also provide one or more implementations of this abstraction
as examples -- this could include a repo.py- or repository_tool-like
as examples -- this could include a _repo.py_- or _repository_tool_-like
implementation.

This could be a compromise that allows:
Expand All @@ -123,6 +121,16 @@ The approach does have some downsides:
* A prototype has been implemented (see Links below) but the concept is still
unproven

More details in [Design document](../repository-library-design.md).

## Links
[Design document for minimal repository abstraction](https://docs.google.com/document/d/1YY83J4ihztsi1Qv0dJ22EcqND8dT80AGTduwgh0trpY)
[Prototype implementation of minimal repository abstraction](https://github.com/vmware-labs/repository-editor-for-tuf/)
* [Design document for minimal repository abstraction](../repository-library-design.md)
* [Prototype implementation of minimal repository abstraction](https://github.com/vmware-labs/repository-editor-for-tuf/)


[^1]:
[RepositorySimulator](https://github.com/theupdateframework/python-tuf/blob/develop/tests/repository_simulator.py)
in python-tuf tests is an in-memory implementation, while
[repository-editor-for-tuf](https://github.com/vmware-labs/repository-editor-for-tuf)
is an external Command line repository maintenance tool.

Binary file added docs/repository-library-design-ownership.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/repository-library-design-usage.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
194 changes: 194 additions & 0 deletions docs/repository-library-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
# Python-tuf repository API proposal: _minimal repository abstraction_

This is an attachment to ADR 10: _Repository library design built on top of
Metadata API_, and documents the design proposal in Dec 2020.

## Design principles

Primary goals of this repository library design are
1. Support full range of repository implementations: from command line
“repository editing” tools to production repositories like PyPI
2. Provide canonical solutions for the difficult repository problems but avoid
making implementation decisions
3. Keep python-tuf maintenance burden in mind: less is more

Why does this design look so different from both legacy python-tuf code and
other implementations?
* Most existing implementations are focused on a specific use case (typically a
command line application): this is a valid design choice but severely limits
goal #1
* The problem space contains many application decisions. Many implementations
solve this by creating functions with 15 arguments: this design tries to find
another way (#2)
* The Metadata API makes modifying individual pieces of metadata simpler. This,
combined with good repository API design, should enable more variance in
where things are implemented: The repository library does not have to
implement every little detail as we can safely let specific implementations
handle things, see goal #3
* This variance means we can start by implementing a minimal design: as
experience from implementations is collected, we can then move implementation
details into the library (goals #2, #3)

## Design

![Design: Application and library components](repository-library-design-ownership.jpg)

The design expects a fully functional repository application to contain code at
three levels:
* Repository library (abstract classes that are part of python-tuf)
* The Repository abstract class provides an ergonomic metadata editing API
for all code levels to use. It also implements some core edit actions like
snapshot update
* A small amount of related functionality is also provided (private key
management API, maybe repository validation)
* is a very small library: possibly a few hundred lines of code
* Concrete Repository implementation (typically part of application code,
implements interfaces provided by the repository API in python-tuf)
* Contains the “application level” decisions that the Repository abstraction
requires to operate: examples of application decisions include
* _when should “targets” metadata next expire when it is edited?_
* _What is the current “targets” metadata version? Where do we load it
from?_
* _Where to store current “targets” after editing? Should the previous
version be deleted from storage?_
* Actual application
* Uses the Repository API to do the repository actions it needs to do

For context here’s a trivial example showing what “ergonomic editing” means --
this key-adding code could be in the application or in the python-tuf library:

```python
with repository.edit(“targets”) as targets:
# adds a key for role1 (as an example, arbitrary edits are allowed)
targets.add_key(“role1”, key)
```

This code loads current targets metadata for editing, adds the key to a role,
and handles version and expiry bumps before persisting the new targets version.
The reason for the context manager style is that it manages two things
simultaneously:
* Hides the complexity of loading and persisting metadata, and updating expiry
and versions from the editing code (by putting it in the repository
implementation – which may still be provided by the application)
* Still allows completely arbitrary edits on the metadata in question: now the
library does not need to anticipate what application wants to do and on the
other hand library can still provide e.g. snapshot functionality without
knowing about the application decisions mentioned in previous point.

Other designs do not seem to manage both of these.

![Design: How components are used](repository-library-design-usage.jpg)

The core idea here is that because editing is ergonomic enough, when new
functionality (like “developer uploads new targets”) is added, _it can be added
at any level_: the application might add a `handle_new_target_files()` method
that adds a bunch of targets into the metadata, but one of the previous layers
could offer that as a helper function as well: code in both cases would look
similar as it would use the common editing interface.

There are a few additional items worth mentioning:
* Private key management: the Repository API should come with a “keyring
abstraction” -- a way for the application to provide roles’ private keys for
the Repository to use. Some implementations could be provided as well.
* Validating repository state: the design is very much focused on enabling
efficient editing of individual metadata. Implementations are also likely to
be interested in validating (after some edits) that the repository is correct
according to client workflow and that it contains the expected changes. The
Repository API should provide some validation, but we should recognise that
validation may be implementation specific.
* Improved metadata editing: There are a small number of improvements that
could be made to metadata editing. These do not necessarily need to be part
of the repository API: they could be part of Metadata API as well

It would make sense for python-tuf to ship with at least one concrete
Repository implementation: possibly a repo.py look alike. This implementation
should not be part of the library but an example.

## Details

This section includes links to a Proof of Concept implementation in
[repository-editor-for-tuf](https://github.com/vmware-labs/repository-editor-for-tuf/):
it should not be seen as the exact proposed API but a prototype of the ideas.

The ideas in this document map to POC components like this:

| Concept | repository-editor-for-tuf implementation |
|-|-|
| Repository API | [librepo/repo.py](https://github.com/vmware-labs/repository-editor-for-tuf/blob/main/tufrepo/librepo/repo.py), [librepo/keys.py](https://github.com/vmware-labs/repository-editor-for-tuf/blob/main/tufrepo/librepo/repo.py) |
| Example of repository implementation | [git_repo.py](https://github.com/vmware-labs/repository-editor-for-tuf/blob/main/tufrepo/git_repo.py) |
|Application code | [cli.py (command line app)](https://github.com/vmware-labs/repository-editor-for-tuf/blob/main/tufrepo/cli.py), [keys_impl.py (keyring implementation)](https://github.com/vmware-labs/repository-editor-for-tuf/blob/main/tufrepo/keys_impl.py) |
| Repository validation | [verifier.py (very rough, not intended for python-tuf)](https://github.com/vmware-labs/repository-editor-for-tuf/blob/main/tufrepo/verifier.py)
| Improved Metadata editing | [helpers.py](https://github.com/vmware-labs/repository-editor-for-tuf/blob/main/tufrepo/helpers.py)


### Repository API

Repository itself is a minimal abstract class: The value of this class is in
defining the abstract method signatures (most importantly `_load`, `_save()`,
`edit()`) that enable ergonomic metadata editing. The Repository class in this
proposal includes concrete implementations only for the following:
* `sign()` -- signing without editing metadata payload
* `snapshot()` -- updates snapshot and timestamp metadata based on given input.
Note that a concrete Repository implementation could provide an easier to use
snapshot that does not require input (see example in git_repo.py)

More concrete implementations (see cli.py for examples) could be added to
Repository itself but none seem essential at this point.

The API requires a “Keyring” abstraction that the repository code can use to
lookup a set of signers for a specific role. Specific implementations of
Keyring could include a file-based keyring for testing, env-var keyring for CI
use, etc. Some implementations should be provided in the python-tuf code base
and more could be implemented in applications.

_Prototype status: Prototype Repository and Keyring abstractions exist in
librepo/repo.py._

### Example of Repository implementation

The design decisions that the included example `GitRepository` makes are not
important but provide an example of what is possible:
* Metadata versions are stored in files in git, with filenames that allow
serving the metadata directory as is over HTTP
* Version bumps are made based on git status (so edits in staging area only
bump version once)
* “Current version” when loading metadata is decided based on filenames on disk
* Files are removed once they are no longer part of the snapshot (to keep
directory uncluttered)
* Expiry times are decided based on an application specific metadata field
* Private keys can be stored in a file or in environment variables (for CI use)

Note that GitRepository implementation is significantly larger than the
Repository interface -- but all of the complexity in GitRepository is really
related to the design decisions made there.

_Prototype status: The GitRepository example exists in git_repo.py._

### Validating repository state

This is mostly undesigned but something built on top of TrustedMetadataSet
(currently ngclient component) might work as a way to easily check specific
aspects like:
* Is top-level metadata valid according to client workflow
* Is a role included in the snapshot and the delegation tree

It’s likely that different implementations will have different needs though: a
command line app for small repos might want to validate loading all metadata
into memory, but a server application hosting tens of thousands of pieces of
metadata is unlikely to do so.

_Prototype status: A very rough implementation exists in verifier.py : this is
unlikely to be very useful_

### Improved metadata editing

Currently the identified improvement areas are:
* Metadata initialization: this could potentially be improved by adding
default argument values to Metadata API constructors
* Modifying and looking up data about roles in delegating metadata
(root/targets): they do similar things but root and targets do not have
identical API. This may be a very specific use case and not interesting
for some applications

_Prototype status: Some potential improvements have been collected in
helpers.py_

0 comments on commit 79ae764

Please sign in to comment.