Skip to content

Commit

Permalink
ADR: Add New repository library design
Browse files Browse the repository at this point in the history
Document the decision to build a repository library on top of Metadata
API.

Signed-off-by: Jussi Kukkonen <[email protected]>
  • Loading branch information
Jussi Kukkonen committed Nov 24, 2021
1 parent acb201d commit 845f307
Show file tree
Hide file tree
Showing 2 changed files with 129 additions and 0 deletions.
128 changes: 128 additions & 0 deletions docs/adr/0010-repository-library-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Repository library design built on top of Metadata API


## Context and Problem Statement

The Metadata API provides a modern Python API for accessing individual pieces
of metadata. It does not provide any wider context help to someone looking to
implement a TUF repository.

The legacy python-tuf implementation offers tools for this but suffers from
some issues (as do many other implementations):
* There is a _very_ large amount of code to maintain: repo.py,
repository_tool.py and repository_lib.py alone are almost 7000 lines of code.
* The "library like" parts of the implementation do not form a good coherent
API: methods routinely have a large number of arguments, code still depends
on globals in a major way and application (repo.py) still implements a lot of
"repository code" itself
* The "library like" parts of the implementation make decisions that look like
application decisions. As an example, repository_tool loads _every_ metadata
file in the repository: this is fine for CLI that operates on a small
repository but is unlikely to be a good choice for PyPI.


## Decision Drivers

* There is a consensus on removing the legacy code from python-tuf due to
maintainability issues
* Metadata API makes modifying metadata far easier than legacy code base: this
makes significantly different designs possible
* Not providing a "repository library" (and leaving implementers on their own)
may be a short term solution because of the previous point, but it does seem
like the project would benefit from some shared repository code and shared
repository design
* Maintainability of new library code must be a top concern
* Allowing a wide range of repository implementations (from CLI tools to
minimal in-memory implementations to large scale applications like Warehouse)
would be good: unfortunately these can have wildly differing requirements


## Considered Options

1. No repository packages
2. repository_tool -like API
3. Minimal repository abstraction


## Decision Outcome

Option 3: Minimal repository abstraction

While option 1 might be used temporarily, the goal should be to implement a
minimal repository abstraction as soon as possible: this should give the
project a path forward where the maintenance burden is reasonable and results
should be usable very soon. The python-tuf repository functionality can be
later extended as ideas are experimented with in upstream projects and in
python-tuf example code.

The concept is still unproven but validating the design should be straight
forward: decision could be re-evaluated in a few months if not in weeks.


## Pros and Cons of the Options

### No repository packages

Metadata API makes editing the repository content vastly simpler. There are
already repository implementations built with it (RepositorySimulator in
python-tuf tests is an in-memory implementation, while
repository-editor-for-tuf is an external CLI tool) so clearly a repository
library is not an absolute requirement.

Not providing repository packages in python-tuf does mean that external
projects could experiment and create implementations without adding to the
maintenance burden of python-tuf. This would be the easiest way to iterate many
different designs and hopefully find good ones in the end.

That said, there are some tricky parts of repository maintenance (e.g.
initialization, snapshot update, hashed bin management) that would benefit from
having a canonical implementation. Likewise, a well designed library could make
some repeated actions (e.g. version bumps, expiry updates, signing) much easier
to manage.

### repository_tool -like API

It won't be possible to support the repository_tool API as it is but a similar
one would certainly be an option.

This would likely be the easiest upgrade path for any repository_tool users out
there. The implementation would not be a huge amount of work as Metadata API
makes many things easier.

However, repository_tool (and parts of repo.py) are not a great API. It is
likely that a similar API suffers from some of the same issues: it might end up
being a substantial amount of code that is only a good fit for one application.

### Minimal repository abstraction

python-tuf could define a tiny repository API that
* provides carefully selected core functionality (like core snapshot update)
but...
* does not implement all repository actions itself, instead i makes it easy
for the application code to do them
* leaves application details to specific implementations (examples of decisions
a library should not always decide: "are targets stored with the repo?",
"which versions of metadata are stored?", "when to load metadata?", "when to
unload metadata?", "when to bump metadata version?", "what is the new expiry
date?", "which targets versions should be part of new snapshot?")

python-tuf could also provide one or more implementations of this abstraction
as examples -- this could include a repo.py- or repository_tool-like
implementation.

This could be a compromise that allows:
* low maintenance burden on python-tuf: initial library could be tiny
* sharing the important, canonical parts of a TUF repository implementation
* ergonomic repository modification, meaning most actions do not have to be in
the core code
* very different repository implementations using the same core code and the
same abstract API

The approach does have some downsides:
* it's not a drop in replacement for repository_tool or repo.py
* A prototype has been implemented (see Links below) but the concept is still
unproven

## Links
[Design document for minimal repository abstraction](https://docs.google.com/document/d/1YY83J4ihztsi1Qv0dJ22EcqND8dT80AGTduwgh0trpY)
[Prototype implementation of minimal repository abstraction](https://github.com/vmware-labs/repository-editor-for-tuf/)
1 change: 1 addition & 0 deletions docs/adr/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ This log lists the architectural decisions for tuf.

- [ADR-0008](0008-accept-unrecognised-fields.md) - Accept metadata that includes unrecognized fields
- [ADR-0009](0009-what-is-a-reference-implementation.md) - Primary purpose of the reference implementation
- [ADR-0010](0010-repository-library-design.md) - Repository library design built on top of Metadata API

<!-- adrlogstop -->

Expand Down

0 comments on commit 845f307

Please sign in to comment.