-
Notifications
You must be signed in to change notification settings - Fork 272
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ADR: Add New repository library design
Document the decision to build a repository library on top of Metadata API. Signed-off-by: Jussi Kukkonen <[email protected]>
- Loading branch information
Jussi Kukkonen
committed
Nov 24, 2021
1 parent
acb201d
commit 845f307
Showing
2 changed files
with
129 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,128 @@ | ||
# Repository library design built on top of Metadata API | ||
|
||
|
||
## Context and Problem Statement | ||
|
||
The Metadata API provides a modern Python API for accessing individual pieces | ||
of metadata. It does not provide any wider context help to someone looking to | ||
implement a TUF repository. | ||
|
||
The legacy python-tuf implementation offers tools for this but suffers from | ||
some issues (as do many other implementations): | ||
* There is a _very_ large amount of code to maintain: repo.py, | ||
repository_tool.py and repository_lib.py alone are almost 7000 lines of code. | ||
* The "library like" parts of the implementation do not form a good coherent | ||
API: methods routinely have a large number of arguments, code still depends | ||
on globals in a major way and application (repo.py) still implements a lot of | ||
"repository code" itself | ||
* The "library like" parts of the implementation make decisions that look like | ||
application decisions. As an example, repository_tool loads _every_ metadata | ||
file in the repository: this is fine for CLI that operates on a small | ||
repository but is unlikely to be a good choice for PyPI. | ||
|
||
|
||
## Decision Drivers | ||
|
||
* There is a consensus on removing the legacy code from python-tuf due to | ||
maintainability issues | ||
* Metadata API makes modifying metadata far easier than legacy code base: this | ||
makes significantly different designs possible | ||
* Not providing a "repository library" (and leaving implementers on their own) | ||
may be a short term solution because of the previous point, but it does seem | ||
like the project would benefit from some shared repository code and shared | ||
repository design | ||
* Maintainability of new library code must be a top concern | ||
* Allowing a wide range of repository implementations (from CLI tools to | ||
minimal in-memory implementations to large scale applications like Warehouse) | ||
would be good: unfortunately these can have wildly differing requirements | ||
|
||
|
||
## Considered Options | ||
|
||
1. No repository packages | ||
2. repository_tool -like API | ||
3. Minimal repository abstraction | ||
|
||
|
||
## Decision Outcome | ||
|
||
Option 3: Minimal repository abstraction | ||
|
||
While option 1 might be used temporarily, the goal should be to implement a | ||
minimal repository abstraction as soon as possible: this should give the | ||
project a path forward where the maintenance burden is reasonable and results | ||
should be usable very soon. The python-tuf repository functionality can be | ||
later extended as ideas are experimented with in upstream projects and in | ||
python-tuf example code. | ||
|
||
The concept is still unproven but validating the design should be straight | ||
forward: decision could be re-evaluated in a few months if not in weeks. | ||
|
||
|
||
## Pros and Cons of the Options | ||
|
||
### No repository packages | ||
|
||
Metadata API makes editing the repository content vastly simpler. There are | ||
already repository implementations built with it (RepositorySimulator in | ||
python-tuf tests is an in-memory implementation, while | ||
repository-editor-for-tuf is an external CLI tool) so clearly a repository | ||
library is not an absolute requirement. | ||
|
||
Not providing repository packages in python-tuf does mean that external | ||
projects could experiment and create implementations without adding to the | ||
maintenance burden of python-tuf. This would be the easiest way to iterate many | ||
different designs and hopefully find good ones in the end. | ||
|
||
That said, there are some tricky parts of repository maintenance (e.g. | ||
initialization, snapshot update, hashed bin management) that would benefit from | ||
having a canonical implementation. Likewise, a well designed library could make | ||
some repeated actions (e.g. version bumps, expiry updates, signing) much easier | ||
to manage. | ||
|
||
### repository_tool -like API | ||
|
||
It won't be possible to support the repository_tool API as it is but a similar | ||
one would certainly be an option. | ||
|
||
This would likely be the easiest upgrade path for any repository_tool users out | ||
there. The implementation would not be a huge amount of work as Metadata API | ||
makes many things easier. | ||
|
||
However, repository_tool (and parts of repo.py) are not a great API. It is | ||
likely that a similar API suffers from some of the same issues: it might end up | ||
being a substantial amount of code that is only a good fit for one application. | ||
|
||
### Minimal repository abstraction | ||
|
||
python-tuf could define a tiny repository API that | ||
* provides carefully selected core functionality (like core snapshot update) | ||
but... | ||
* does not implement all repository actions itself, instead i makes it easy | ||
for the application code to do them | ||
* leaves application details to specific implementations (examples of decisions | ||
a library should not always decide: "are targets stored with the repo?", | ||
"which versions of metadata are stored?", "when to load metadata?", "when to | ||
unload metadata?", "when to bump metadata version?", "what is the new expiry | ||
date?", "which targets versions should be part of new snapshot?") | ||
|
||
python-tuf could also provide one or more implementations of this abstraction | ||
as examples -- this could include a repo.py- or repository_tool-like | ||
implementation. | ||
|
||
This could be a compromise that allows: | ||
* low maintenance burden on python-tuf: initial library could be tiny | ||
* sharing the important, canonical parts of a TUF repository implementation | ||
* ergonomic repository modification, meaning most actions do not have to be in | ||
the core code | ||
* very different repository implementations using the same core code and the | ||
same abstract API | ||
|
||
The approach does have some downsides: | ||
* it's not a drop in replacement for repository_tool or repo.py | ||
* A prototype has been implemented (see Links below) but the concept is still | ||
unproven | ||
|
||
## Links | ||
[Design document for minimal repository abstraction](https://docs.google.com/document/d/1YY83J4ihztsi1Qv0dJ22EcqND8dT80AGTduwgh0trpY) | ||
[Prototype implementation of minimal repository abstraction](https://github.com/vmware-labs/repository-editor-for-tuf/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters