Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write ADR for repository library design #1679

Closed
Tracked by #1694
jku opened this issue Nov 16, 2021 · 3 comments · Fixed by #1693
Closed
Tracked by #1694

Write ADR for repository library design #1679

jku opened this issue Nov 16, 2021 · 3 comments · Fixed by #1693
Assignees
Labels
backlog Issues to address with priority for current development goals
Milestone

Comments

@jku
Copy link
Member

jku commented Nov 16, 2021

I'm trying to provide the following in the coming days:

  • PR with Architectural Decision Record on repository library design based on Metadata API. I see three main options:
    • No repository library, rely on Metadata API to be easy enough for apps to solve the issues themselves
    • provide a repository_tool-like API
    • provide a minimal repository abstraction, show how to implement repositories with it
  • A design document for Minimal Repository Abstraction proposal to back that up

This is part of #1136. My current writing is in https://docs.google.com/document/d/1ZsXWMP_JgsI6RLhyiDn1Fr6rSCMFVoYSBwBakMzspjs but it's really not a proper design document yet

@jku jku self-assigned this Nov 16, 2021
@jku
Copy link
Member Author

jku commented Nov 16, 2021

copy-pasting initial ADR writeup for reference


Repository library design

Context and Problem Statement

The Metadata API provides a modern Python API for accessing individual pieces of metadata. It does not provide any wider context help to someone looking to implement a TUF repository.

The legacy python-tuf implementation offers tools for this but suffers from some issues (as do many other implementations):

  • There is a very large amount of code to maintain: repo.py, repository_tool.py and repository_lib.py alone are almost 7000 lines of code.
  • The "library like" parts of the implementation do not form a good coherent API: methods routinely have a large number of arguments, code still depends on globals in a major way and application (repo.py) still implements a lot of "repository code" itself
  • The "library like" parts of the implementation make decisions that look like application decisions. As an example, repository_tool loads every metadata file in the repository: this is fine for CLI that operates on a small repository but is unlikely to be a good choice for PyPI.

Decision Drivers

  • There is a consensus on removing the legacy code from python-tuf because of maintainability issues
  • Metadata API makes modifying metadata far easier than legacy code base: this makes significantly different designs possible
  • Not providing a "repository library" (and leaving implementers on their own) may be a short term solution because of the previous point, but it does seem like the project would benefit from some shared repository code and shared repository design
  • Maintainability must be a top concern
  • Allowing a wide range of repository implementations (from CLI tools to minimal in-memory implementations to large scale applications like Warehouse) would be good: unfortunately these can have wildly differing requirements

Considered Options

  1. No repository packages
  2. repository_tool -like API
  3. Minimal repository abstraction

Decision Outcome

Option 3: Minimal repository abstraction

While option 1 might be used temporarily, the goal should be to implement a minimal repository abstraction as soon as possible: this should give the project a path forward where the maintenance burden is reasonable and results should be usable very soon. The python-tuf repository functionality can be later extended as ideas are experimented with in upstream projects and in python-tuf example code.

The concept is still unproven but validating the design should be straight forward: decision could be re-evaluated in a few months if not in weeks.

Pros and Cons of the Options

No repository packages

Metadata API makes editing the repository content vastly simpler. There are already repository implementations built with it (RepositorySimulator in python-tuf tests is an in-memory implementation, while repository-editor-for-tuf is an external CLI tool) so clearly a repository library is not an absolute requirement.

Not providing repository packages in python-tuf does mean that external projects could experiment and create implementations without adding to the maintenance burden of python-tuf. This would be the easiest way to iterate many different designs and hopefully find good ones in the end.

That said, there are some tricky parts of repository maintenance (e.g. initialization, snapshot update, hashed bin management) that would benefit from having a canonical implementation. Likewise, a well designed library could make some repeated actions (e.g. version bumps, expiry updates, signing) much easier to manage.

repository_tool -like API

It won't be possible to support the repository_tool API as it is but a similar one would certainly be an option.

This would likely be the easiest upgrade path for any repository_tool users out there. The implementation would not be a huge amount of work as Metadata API makes many things easier.

However, repository_tool (and parts of repo.py) are not a great API. It is likely that a similar API suffers from some of the same issues: it might end up being a substantial amount of code that is only a good fit for one application.

Minimal repository abstraction

python-tuf could define a tiny repository API that

  • provides carefully selected core functionality (like core snapshot update) but
  • does not implement all repository actions itself but makes it easy for the application code to do them
  • leaves application details to specific implementations (examples of decisions a library should not always decide: "are targets stored with the repo?", "which versions of metadata are stored?", "when to load metadata?", "when to unload metadata?", "when to bump metadata version?", "what is the new expiry date?", "which versions should be part of new snapshot?")

python-tuf could also provide one or more implementations of this abstraction as examples -- this could include a repo.py- or repository_tool-like implementation.

This could be a compromise that allows:

  • low maintenance burden on python-tuf: initial library could be tiny
  • sharing the important, canonical parts of a TUF repository implementation
  • ergonomic repository modification, meaning most actions do not have to be in the core code
  • very different repository implementations using the same core code and the same abstract API

The approach does have some downsides:

  • it's not a drop in replacement for repository_tool or repo.py
  • A prototype has been implemented (see Links below) but the concept is still unproven

Links

Design document for minimal repository abstraction
Prototype implementation of minimal repository abstraction

@lukpueh
Copy link
Member

lukpueh commented Nov 17, 2021

Great write-up, @jku! I'd love if we had a comprehensive list of both the specific functionality a minimal repository abstraction includes, and leaves to application implementations respectively, and also why.

Maybe this is more relevant for the design document than for the ADR. What do you think?

@jku
Copy link
Member Author

jku commented Nov 17, 2021

Oh agreed, I think the ADR is pretty pointless without some kind of design document: this is why the content is just a comment for now...

The point about including reasoning for those decisions is good, I'll have to think where and how to do that

@sechkova sechkova added the backlog Issues to address with priority for current development goals label Nov 24, 2021
@sechkova sechkova added this to the Sprint 13 milestone Nov 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog Issues to address with priority for current development goals
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants