Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Decouple Dependencies from the Buildpacks #287

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ForestEckhardt
Copy link
Contributor

@ForestEckhardt ForestEckhardt requested a review from a team as a code owner April 25, 2023 19:48

For builders, buildpacks, or platforms that would like to inject dependency
assets directly into the build container, perhaps to support offline builds, we
propose defining `BP_DEPENDENCY_BINARIES` which defaults to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this environment variable name should more clearly indicate that it is referring to a filepath prefix. Maybe something BP_DEPENDENCY_BIN_PATH?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe BP_DEPENDENCY_ASSET_ROOT or _PATH?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, both of those work as well. Do we have examples of prior art for naming env vars for directories - both in Paketo and in the upstream Buildpacks projects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying out BP_DEPENDENCY_ASSET_ROOT for now.

other locations, eventually proposing an RFC with Cloud-Native buildpacks to
standardize the location).

In this way, one could have dependency metadata that uses `file://` URLs to
Copy link
Member

@robdimsdale robdimsdale Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this statement. I would have thought that the benefit of using this configurable path prefix is so that the dependency metadata doesn't have to include the full path.

I was thinking it would work like:

[[versions]]
...
uri = file://com/paketo/python/pip-1.2.3.tgz

which gets mapped to:

$BP_DEPENDENCY_BINARIES/com/paketo/python/pip-1.2.3.tgz

i.e.

/platform/deps/assets/com/paketo/python/pip-1.2.3.tgz

As it currently reads, the proposal sounds like the metadata toml file has to include the full path, which makes me question where the environment variable (BP_DEPENDENCY_BINARIES) is being used.

I'm sure I'm missing something, so maybe an example here would help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At present, there no magic with the root.

If you are injecting metadata and assets the you need to coordinate. If you are putting assets at /foo then Metadata must be set with paths to that location.

It is very basic. Easy to implement.

@ForestEckhardt And I were talking about this. It's hard to tell how this will play out so we're thinking that we keep things simple now.

If we need to add in a "base" for metadata urls we can do that without breaking things later. This could also be something that tooling manages so it's less of an issue that it might appear.

Obviously if someone feels strongly about needing more in this part of the RFC we can do that. Just wanted to start minimally and work up.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that we should rather exclude the second environment variable, i.e. BP_DEPENDENCY_BINARIES, in that case.

Copy link
Member

@robdimsdale robdimsdale Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with @loewenstein - if the metadata has to be full paths then I'm missing the value of BP_DEPENDENCY_BINARIES. How would that environment variable be consumed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying. That seems unnecessary in this context. I'll likely remove this, but let me think about this a little more just to make sure I'm not forgetting something.

- The actual dependencies are accessed via the metadata and that can happen
over any protocol (HTTPS/SFTP/FILE) and be distributed in any format
(archive/image).
- Dependency metadata will be removed from `buildpack.toml`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this RFC should not propose removing dependency metadata from buildpack.toml. I think it should describe the ordering of looking for metadata in the new place, followed by the existing buildpack.toml. I get that this is an implementation concern of the buildpack, but I think it's worth calling out in the RFC.

I say this because I think it's valuable to provide the ability to fallback to the current mechanism for some period while we incrementally build out the new system and uncover any issues, and I think we should introduce the idea of removing buildpack.toml metadata in a separate RFC once all buildpacks have an established model for providing dependencies via the new mechanism.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we call out the idea of not removing the current way of doing things in multiple places in the RFC, which is good, so i think probably just need to update this summary bullet point to match the rest of the RFC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 removal of the metadata for dependencies in the buildpacks is not part of this RFC. Probably a separate RFC.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Dependency metadata will be removed from `buildpack.toml`
- Dependency metadata will **not** be removed from `buildpack.toml`, this will be a matter of a separate RFC once all buildpacks have adopted this RFC.

FWIW I am not sure if fallback is the best approach. We should at least consider if it's best to error out if for example the buildpack and the user provide metadata - this might well be a conscious choice of the buildpack team to not (yet) support this RFC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I am not sure if fallback is the best approach. We should at least consider if it's best to error out if for example the buildpack and the user provide metadata - this might well be a conscious choice of the buildpack team to not (yet) support this RFC.

That's a fair point. I think the situation I'm worried about is that without fallback we have to make a big-bang switchover from one system to another. One example of this is coordinating the removal of a dependency's metadata from a buildpack (e.g. cpython) with simultaneously adding it to the "dependency" buildpack in the language family. Without fallback these changes have to literally be simultaneous, otherwise you break things either way - no metadata is a failure mode, as is two copies of metadata. I think having a fallback provides a smooth transition.

Maybe I'm over-indexing on the buildpack-author's experience, but it seems unnecessary to make our lives harder when there is no cost to the end user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem at all with the concept of meta/top-level RFC. It's what we did with the dependencies rewrite and it worked well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little confused on the direction we want here. Do we want to add a trigger mechanism as part of THIS RFC or do we want to hold out for an RFC that is more specific on the implementation? It is sounding like we want to add a mechanism to opt-in and then in a subsequent RFC it will be changed to be an opt-out trigger. Is that correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to think about this RFC as a "specification" or "API" for how dependency-providing buildpacks (like go-dist, cpython, etc) will be able to locate metadata and dependencies. I think there's a separate RFC to be created to discuss the default implementation of this specification (spoiler alert: I'd advocate for a buildpack in each language family that provides this metadata).

So, I think interface constructs like BP_EXTERNAL_DEPENDENCIES_DISABLED should probably be defined in this RFC. If they're not, I think we run the risk of muddling up the specification and the chosen, default, implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok while double checking the RFC I think that this use case is covered by BP_EXTERNAL_METADATA_ENABLED. Please check out the definition of this environment variable in the implementation section and see if that is what you are looking for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. I forgot about that. I think that covers my concern.

merged into the official Paketo documentation.

### Buildpack Migration Process
Once a language family has added support for the new metadata format and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean for a language family to have support for this new metadata? Language-family buildpacks don't have dependencies, so is the intention here to say: "all buildpacks in a language family that have dependencies have added support for the new mechanism"?

Copy link
Contributor

@dmikusa dmikusa Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is to just say that language families have the final say on depreciation. The RFC provides guidance but doesn't mandate anything.

We wanted to cover this point, but we realize that it will likely vary from team to team so we wanted to let teams have the final say on things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Would it be fair to say something like: "once a language-family maintainer group is comfortable that all buildpacks have migrated..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated some of the language around here please feel free to take a look!

Comment on lines +108 to +110
com
└── example
└── dep-a.toml

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

com
└── example
    └── dep-a
        └── metadata.toml

Is there a particular reason to treat namespace and dependency name differently? We could just add one more node to the tree representing the dependency and have a fixed name for the file, e.g. metadata.toml or dependency.toml.
It doesn't mater much, but would leave room to even extend metadata with additional toml files if we'd ever see a reason to do so.

Alternatively, if we go for a fixed layout with dep-a.toml as the leafs, one potential benefit could be to allow the file system structure to be optional. I.e. we could define

[[com]]
[[com.example]]
[[com.example."dep-a"]]
[[com.example."dep-a".versions]]
name="dep-a"

[[com.example."dep-b"]]
[[com.example."dep-b".versions]]
name="dep-b"

to keep things simple in case of a few dependencies and allow the filesystem structure as convenience (nodes in the filesystem leading to TOML table array prefixes) in a parser for dependencies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this extensibility. I don't think it costs anything to adopt this over what's proposed in the RFC currently. But I could be missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You first proposal is interesting because it would allow us the have N length domain names with very simple backend logic. If you want com.example.dep-a or com.example.group.subgroup.dep-a you just need to follow the directory path and then grab the metadata file @dmikusa is there any lose in security by doing this?

As for the second proposal I am not sure that I entirely understand the benefit that you are trying to layout. What would be the advantage allowing the filesystem structure to be optional? I feel like it would make things more difficult with you wanted to combine dependency metadata packages together.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make small dependency packages easier to author, think just a small file if it's just a single or two dependencies.

Imagine a user want to consume all dependencies from Paketo, but one of the updates breaks their own code and they need to temporarily substitute metadata (and assets) with an older version.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW in the end the dependencies are just a tree with namespace nodes and a named leaf with metadata.

Merging should be the same, no matter how the tree is represented - even a mixed representation shouldn't make much of a difference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason to treat namespace and dependency name differently? We could just add one more node to the tree representing the dependency and have a fixed name for the file, e.g. metadata.toml or dependency.toml.

My thought was that it would keep the directory structure a little flatter, but I don't have a problem with doing it this way. I get what you're saying in terms of flexibility to add more files. I was thinking additional metadata would go in the dependency.toml file, but perhaps there is something else best represented outside of the file. I'm fine with that change, if others agree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, if we go for a fixed layout with dep-a.toml as the leafs, one potential benefit could be to allow the file system structure to be optional.

I can see where you're going, but I'd say in the interest of keeping things MVP let's try things out without this addition and see how it goes first. I suspect there will be tooling to help manage the metadata, so hopefully, it's not too much of a burden. If it turns out to be, we can come back and look at ways to address.

Comment on lines +113 to +121
As mentioned previously, each individual metadata file has a file name
consisting of the dependency name with an extension of `toml`. The internal
format consists of a single table called `versions` which is an array of tables
containing all of the versions for that particular dependency. Each version
entry requires a `uri`, `version`, `checksum`, `arch`, `os`, and `license`. It
may also have `name`, `purl`, `strip-components`, and `cpes` although these are
optional. The `cpes` entry is an array of strings identifying all of the CPEs
for that dependency. The `license` is itself an array of tables containing
`type` and `uri` of the license for the dependency.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the same mandatory and/or optional values they currently are for the buildpack.toml? If they differ, like this RFC introducing additional ones like arch and os or if there are fields that changed from mandatory to optional or vice versa this would be worth pointing out explicitly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that it's worth being explicitly. FWIW I was assuming this RFC wouldn't change the syntax/semantics of the existing metadata - it just allows it to be located at a new location on the filesystem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't think that any default values with change. I think that introduction of things like arch and os are future proofing us for the removal of stacks and introduction of ARM capable buildpacks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 100% @ForestEckhardt. Should be the same metadata as now, but just trying to get ahead of the stack removal stuff.

Comment on lines +160 to +165
A buildpack does not need to include this section, it is optional. If included,
the buildpack and libraries like `libpak` and `packit` may use the information
to fail if dependency versions are requested by a user that might cause
problems for the buildpack. It is the buildpacks responsibility to process the
validations and react to them, whether that be warning the user or even
failing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that to say libpak and packit could provide a validate function, but buildpacks are responsible to call it, iterate over the result and decide on log output and success vs. failure?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a pragmatic way to facilitate buildpacks providing a consistent, helpful error message when the metadata/dependencies are incompatible with the buildpack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this validation section is staying in the metadata section of buildpack.toml it will continue to be Paketo specific. Therefore Paketo libraries would be responsible for respecting this metadata but it will not become an explicit part of the outward facing interface of the decouple dependency metadata (although users creating their own buildpacks may want to leverage that Paketo specific API)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @ForestEckhardt and @robdimsdale - I don't think every buildpack will need to do this, but if buildpack authors want more control they can do this. This was really added for authors to give allow them to have some control over what dependencies are acceptable. To avoid the questions like, why doesn't your buildpack work with super.old.version of dependency ABC?

Comment on lines +170 to +172
Format](#metadata-format) section. In addition, it includes an array of strings
called `supported` which contains a list of [semver](https://semver.org/)
ranges that indicate what is supported by that buildpack. Optionally it can

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, for now, validations are only about versions? Probably an edge case, but the Java language family would benefit from having all Java vendors under a single namespace, wouldn't it? Has it been considered to add different validations in the future? If so, should this RFC state this somehow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not as familiar with the Java ecosystem, but it seems to be that if you have something specific in mind you could propose it as an addition to this RFC, but otherwise it seems reasonable to defer that for a potential future RFC.

Copy link
Contributor Author

@ForestEckhardt ForestEckhardt May 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there is a discussion of sort of sub-team group domain io.paketo.java.(dep) for example that would allow us to group similar dependency together and create useful dependency metadata subset packages. I think this might solve the problem that you are talking about be I am not 100% sure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My quest was if io.paketo.java.* would be about names for dependencies or for a separate packaging mechanism for dependencies...

I.e. I would expect the Paketo apache-tomcat buildpack to require a dependency named org.apache.tomcat rather than io.paketo.java.tomcat.

Copy link
Contributor

@dmikusa dmikusa May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@loewenstein I think what you're talking about, how a buildpack locates a dependency, is not specified in this RFC. I think that is something we want to start simple with and build out more advanced usage as necessary.

My initial thought was that a buildpack would probably look for a particular namespace and dependency id. We could perhaps get more sophisticate in the future though, looking for dependency ids across namespaces or introducing tags or labels or something else for search and location of dependencies. Initially, I think simpler is better though.

If it helps, we can add something about how a buildpack could potentially locate dependencies.

Comment on lines +230 to +233
Further, this proposal suggests subdividing metadata images by project
sub-team. Each sub-team will be given a unique reverse domain name like
`io.paketo.java` and `io.paketo.utilities`. In this way, the project’s metadata
can be easily combined without having conflicts.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this reverse domain name about the metadata image names or the file system structure for dependencies?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read it as filesystem layout - so that multiple metadata sources are guaranteed not to clash on the filesystem. I don't think the image name needs to be specified here, as this RFC doesn't require images to be the distribution mechanism of choice for dependencies.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about io.paketo in the context of java in particular. Imho the Java buildpacks should not look up Java VM dependencies in a Paketo namespace, but rather something like org.openjdk or something like that. Similarly, it should be org.apache.tomcat not io.paketo.java.tomcat, or at least I would like to think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that seems reasonable. I guess it comes down to how do you want to ensure that dependencies don't step on each other toes. Language-maintainers was called out in the RFC and seems like an obvious choice. But I guess pre-existing reverse domain name layout (which is common in the java ecosystem) seems reasonable too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding (and @dmikusa please correct me if I have the wrong idea here). Was something to the affect of io.paketo.java.tomcat this would allow us to group similar dependencies together in one package more easily.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't we use the reverse domain names fitting to the upstream dependencies - at least where we don't compile them ourselves, there's nothing Paketo specific to the dependencies, or is there?

Regarding the packaging, if we allow the tree to be represented in a single toml file, grouping similar dependencies should be easy as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I was also wondering why we would have a paketo-scoped domain for metadata for dependencies that we don't compile

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is to group by dependency provider. If an upstream project published buildpack dependency metadata, then they could use their own namespace. We publish this metadata, so it goes under io.paketo....

The only place this falls down a bit is when you want to override dependencies for Paketo buildpacks, then you need to republish something under the paketo.io namespace, at least initially (see my other comments on how buildpacks can find dependencies). I think that's OK for MVP though, and that there's room here to build and do more sophisticated things so that com.sap namespace can publish and override dependencies published by io.paketo without having to change the buildpacks.

Comment on lines +101 to +102
The directory structure will contain a folder for each dot-separated segment of
the dependency’s organization name and in the lowest level directory there will

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having read further, if we plan to support multiple dependency providers - which we should - wouldn't this get quite difficult to accomplish? Like having too different dependency providers and two different corresponding metadata images, both providing dependencies in the TLD com - how would the metadata from both be mounted?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect that if you provided metadata via multiple images that the container runtime would handle the overlaying of filesystems. I expect that the same would be true when using platform volume mounting, but I'm not familiar enough with the platform spec to know if that's a reasonable assumption.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that overlaying volume mounts is a given, but I definitely share the lack of knowledge to be sure.

This rfc seems to explicitly rely on someone to produce a single source - hence the mentioning of tooling that can take multiple sources and merge it into one. I would think though that this is less dynamic, similar to how one could produce custom stacks and builders - there just quite some effort and automation involved in it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that. I'm assuming that for the default case, there will be a single source per Paketo language family. I.e. the Java buildpack maintainers would take the existing dependencies out of the existing buildpacks and move them to a single source that they create and maintain.

Flipping this around, what other system would you propose? We could use guids for dependency layouts, which (effectively) guarantees we won't have filesystem clashes but comes at the cost of readability on the filesystem.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One idea could be to introduce a fs node for the dependency sources, like /platform/dependencies/metadata/io.paketo/org/apache/tomcat... But that's just a rough thought - will look into this tomorrow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let us know if you encounter any issues with that exploration.

Regardless, I think this RFC doesn't have to worry about the mechanism by which the filesystem is created and populated - I think it's out of scope. I think we just have to agree on the filesystem layout and who gets ownership of what level in the filesystem hierarchy.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ForestEckhardt this could as well form a compromise for the above. The Paketo Java subteam could deliver a io.paketo.java dependency bundle that then containes metadata for - amongst others - org.apache.tomcat. WDYT?

Although, this might make it more difficult, to define precedence and for example allow a user to override the Tomcat dependency metadata provided by Paketo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@loewenstein I think we're roughly on the same page, but I think the RFC could probably explain this better.

Like it does now, each dependency will have an id but unlike the present
situation, the id is namespaced. An id is composed of an organization name and
a dependency name.

The namespacing I'd thought about was by buildpack or possibly by buildpack team, not by the dependency project's name, so a more realistic example would be:

/platform/dependencies/metadata/io
  |
  --- paketo
          |
          --- java
                  |
                  --- apache-tomcat.toml
....

The reason I went with top-level directories is that most of the container solutions for injecting files, like volume mounts, assume that there isn't directory overlap (i.e. you can't volume mount over top of something, layer overlays work similarly, in this case you can overlay stuff but then you mask the files in lower layers which is confusing so its best to not do that). Anyway, having the namespace should allow us to easily split up metadata for different buildpack teams or buildpacks and just mount it into different namespaces.

I'm sure there's more we can do with this, but we should try to think MVC here & just ensure that the building blocks are in place so we can do more on top of this structure.

Comment on lines +241 to +242
at the location specified by `BP_DEPENDENCY_METADATA`, which defaults to
`/platform/deps/metadata` (the intent is to pilot and try out this or possibly

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we leave out defaulting at this stage, i.e. make BP_DEPENDENCY_METADATA a mandatory field? The four examples below should work without a default, shouldn't they?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather keep it as a default because then the user doesn't have to provide it at build time. I might be wrong, but I don't think you could set the env var in an upstream buildpack because the env vars do not propagate during the Detect phase, and generally the downstream buildpack would have to know where the metadata is during Detect as well as Build.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it need to know about metadata during detect though? Do you have something specific in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that if there are metadata/dependency incompatibilities that it would be better to fail during Detect (instead of Build) as that is a much faster feedback loop.

Imagine that I spent 10-20 minutes compiling a dependency, followed by 5-10 minutes for my application source code, only to discover that another dependency later on in the Build order was incompatible with the metadata.

Comment on lines +342 to +346
We propose a flag of `BP_EXTERNAL_METADATA_ENABLED` which defaults to `false`
for use as buildpacks are being converted. In the default state, this flag
tells a buildpack to use the metadata included with buildpack.toml. When set to
`true`, a buildpack should use the new metadata. This can provide a way for
users to test the new functionality without impacting existing users.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this more smoothly, by piggy backing on the BP_DEPENDENCY_METADATA environment variable, i.e. when it is set use external metadata if not continue with buildpack.toml based metadata?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would work, except I think it makes for a poor UX because you have to opt-in to the new system by knowing where the files would be on the filesystem.

I think it's a better UX to ask the user to provide BP_EXTERNAL_METADATA_ENABLED than BP_DEPENDENCY_METADATA=/platform/deps/metadata

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my comment was made in the context of not defaulting the path, i.e. if used in a context with external metadata, the variable would be set without the user needing to do anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this conversation is fairly-coupled to the other conversation above about who sets the environment variables - whether they are required during Detect or not.


The recommendation of this proposal is to announce the change in the release
notes and on Slack, providing links to documentation of the new feature. The
goal of this migration is that there is no loss of functionality for buildpack
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the language-family level, I agree this is a non-breaking change and so doesn't have to be a major version bump.

At the individual buildpack level, it is a breaking change though. Sometimes consumers rely on component buildpacks directly outside of the language family, and for these users the removal of a dependency is a breaking change and hence I think the component buildpacks that stop incorporating dependencies should have a major bump.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the individual buildpack level, it is a breaking change though. Sometimes consumers rely on component buildpacks directly outside of the language family, and for these users the removal of a dependency is a breaking change and hence I think the component buildpacks that stop incorporating dependencies should have a major bump.

...but it doesn't remove dependencies, it removes dependency metadata, which is different. IMHO.

Dependencies are part of the public interface/contract with users, but I don't think that where and how we store metadata is part of that contract. So if what we do causes dependencies to be removed, then I 100% agree (I think this is even specified in another RFC) that it triggers a major version update, but I see the actual contents of buildpack.toml as internal to the buildpack and subject to change at our discretion.

I'm not advocating that we just remove it all, but I think the next paragraph strikes a reasonable balance on how teams can deal with that.

Also, this whole section is just "recommendations" and each language family can handle as they see fit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my point is that if a consumer previously relied on a dependency-providing buildpack (e.g. cpython), after we implement this RFC this consumer would also have to modify their infrastructure to pull in the location of the new metadata (new buildpack, builder, etc). That seems like a breaking change at the individual buildpack level, even if we can avoid making breaking changes at the language-family level.

I agree that this doesn't have to be spelled out in the RFC and we can just leave it to language family maintainers.

Copy link

@loewenstein loewenstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, this is kind of a huge change suggestion. I would say the gist of it is that I feel we do not know enough to get into the level of detail yet and should concentrate on what explorations we need to change our confidence to define the right contract between dependency installing buildpacks and the various potential sources of dependencies.

Sorry if this comes like a rant, this is definitely not intended. I very much like the idea of this RFC and the possibilities it will unlock. I am certain some of the concrete proposals are already going into the right direction, but I would really think we need to take a step back, align the general idea and direction amongst the Paketo maintainers, discover the different use cases and problems we want to solve and only then define the concrete soolutions.

Comment on lines +6 to +9
This proposal suggests that we should add a new way for buildpacks to manage,
package, and ship dependencies. At this point in time, we are not advocating
for the deprecation or removal of the present way of managing dependencies,
however, we hope over time that will be the natural evolution of things.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This proposal suggests that we should add a new way for buildpacks to manage,
package, and ship dependencies. At this point in time, we are not advocating
for the deprecation or removal of the present way of managing dependencies,
however, we hope over time that will be the natural evolution of things.
When it comes to binary dependencies, there is currently a strong coupling of the different aspects. This proposal suggests to decouple how buildpacks manage,
package, and ship dependencies from how they install and configure them.
Note that at this point in time, we are not advocating
for the deprecation or removal of the present way of managing dependencies,
however, we hope over time that will be the natural evolution of things.

Comment on lines +12 to +65
Presently, a Paketo buildpack that has binary dependencies will list metadata
about these dependencies within its `buildpack.toml` file. This includes a URL
from which the dependency can be downloaded, and also a checksum hash and other
metadata like PURL and CPEs.

Libraries like `libpak` and `packit` then provide convenience methods for
fetching dependencies using this metadata, verifying the download, and caching
the download result. In addition, they provide tooling to download and store
these dependencies within buildpack images for distribution in offline
environments.

There are also tools published by the project to manage the entries within
`buildpack.toml` through CI pipelines so that dependencies metadata is kept
up-to-date with upstream sources. Unfortunately, this represents a large amount
of toil for the buildpacks team.

As an example of the toil mentioned in a language family like Java, there are
daily project dependencies that need to be updated. This requires reviewing and
merging PRs into the buildpacks to adjust `buildpack.toml` dependency metadata.
Once PRs are merged, a component buildpack needs to be released, followed by a
composite buildpack and then a builder release. This is because most users
don’t consume buildpacks directly, they consume builders which include
buildpacks.

This all has to be done as aggressively as possible so that we are shipping
dependencies, in particular those with security fixes, quickly. This is because
with metadata in `buildpack.toml`, even if an upstream project releases a bug
or security fix, buildpack users cannot get that fix until we update and
release component and composite buildpacks as well as the builder.

There is also toil associated with the tools and pipelines used for this
process. The tools have bugs and need to be updated. At present, the tools we
use to manage all of these updates do not scale well either. In particular
Github Actions, we have had a number of issues hitting rate limits and usage
caps. This gets worse when there are a lot of dependencies to watch, for
example, if your buildpack has multiple version lines or different sets of
packages for a dependency.

The whole process puts an additional maintenance burden on the project
maintainers and project resources. This is not the type of work that a casual
contributor to Paketo will do and as we add more dependencies the burden only
increases on the maintainer teams.

The motivation of this proposal is to…

- Reduce the burden and toil for Paketo buildpack maintainer teams
- Continue publishing dependency updates in a timely and secure manner
- Decouple installing dependencies from configuring them
- Separate metadata and the actual dependencies so they can be provided to
buildpacks in a number of different and flexible ways
- Establish a reasonable release schedule for buildpacks that’s based around
development, not dependencies and thus enabling buildpack vendors to support
version lines, although version line support is not planned for Paketo.
- Make it easier to package buildpacks for offline environments.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Presently, a Paketo buildpack that has binary dependencies will list metadata
about these dependencies within its `buildpack.toml` file. This includes a URL
from which the dependency can be downloaded, and also a checksum hash and other
metadata like PURL and CPEs.
Libraries like `libpak` and `packit` then provide convenience methods for
fetching dependencies using this metadata, verifying the download, and caching
the download result. In addition, they provide tooling to download and store
these dependencies within buildpack images for distribution in offline
environments.
There are also tools published by the project to manage the entries within
`buildpack.toml` through CI pipelines so that dependencies metadata is kept
up-to-date with upstream sources. Unfortunately, this represents a large amount
of toil for the buildpacks team.
As an example of the toil mentioned in a language family like Java, there are
daily project dependencies that need to be updated. This requires reviewing and
merging PRs into the buildpacks to adjust `buildpack.toml` dependency metadata.
Once PRs are merged, a component buildpack needs to be released, followed by a
composite buildpack and then a builder release. This is because most users
don’t consume buildpacks directly, they consume builders which include
buildpacks.
This all has to be done as aggressively as possible so that we are shipping
dependencies, in particular those with security fixes, quickly. This is because
with metadata in `buildpack.toml`, even if an upstream project releases a bug
or security fix, buildpack users cannot get that fix until we update and
release component and composite buildpacks as well as the builder.
There is also toil associated with the tools and pipelines used for this
process. The tools have bugs and need to be updated. At present, the tools we
use to manage all of these updates do not scale well either. In particular
Github Actions, we have had a number of issues hitting rate limits and usage
caps. This gets worse when there are a lot of dependencies to watch, for
example, if your buildpack has multiple version lines or different sets of
packages for a dependency.
The whole process puts an additional maintenance burden on the project
maintainers and project resources. This is not the type of work that a casual
contributor to Paketo will do and as we add more dependencies the burden only
increases on the maintainer teams.
The motivation of this proposal is to…
- Reduce the burden and toil for Paketo buildpack maintainer teams
- Continue publishing dependency updates in a timely and secure manner
- Decouple installing dependencies from configuring them
- Separate metadata and the actual dependencies so they can be provided to
buildpacks in a number of different and flexible ways
- Establish a reasonable release schedule for buildpacks that’s based around
development, not dependencies and thus enabling buildpack vendors to support
version lines, although version line support is not planned for Paketo.
- Make it easier to package buildpacks for offline environments.
The key drawback of the current state with dependencies being packaged with the buildpacks is the rapid release cycle this enforces onto individual buildpacks. What makes matters worse is that the release of a buildpack needs to trigger a cascade of releases to include the buidpack in the language family composite buildpack and the builders Paketo offers.
The main motiviation for this proposal is hence that we can establish a reasonable release schedule for buildpacks while keeping the speed of delivering updates of dependencies.
This will significantly reduce the toil for both maintainers and infrastructure and additionally open new possibilities for buildpack providers, platform providers and user organizations to customize the delivery of dependencies.
- buildpack providers might want to provide offline capabilities (for airgapped environments)
- platform providers might offer mirrors for dependencies
- users might control the pace of adopting dependency updates

Comment on lines +68 to +92
At a high level:

- We will define a metadata format that includes the metadata fields currently
present in `buildpack.toml`.
- This metadata is provided at build-time in a known location on the build-time
filesystem, configurable via environment variables.
- If present, this newly-defined metadata format supersedes the existing
dependency metadata in `buildpack.toml`.
- We will add a dependency version validation section to `buildpack.toml`
metadata, this can be used to state that a buildpack version only supports
certain ranges of a given tool, such as Java `11.*` or Node.js `16.*`.
- Metadata can be provided by anyone. A user can add custom metadata, or source
from a third party project. Paketo will provide an official set of metadata
against which we will test the Paketo buildpacks. It will be distributed via
images in an image registry (Docker Hub).
- Buildpacks do not care how dependency metadata is distributed, that is a
separate concern, instead, they just read metadata from a specified location.
Meta-data could be provided by another buildpack, the builder, the platform
or even the user.
- The actual dependencies are accessed via the metadata and that can happen
over any protocol (HTTPS/SFTP/FILE) and be distributed in any format
(archive/image).
- Dependency metadata will **not** be removed from `buildpack.toml`, this will
be a matter of a separate RFC once all buildpacks have adopted this RFC.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
At a high level:
- We will define a metadata format that includes the metadata fields currently
present in `buildpack.toml`.
- This metadata is provided at build-time in a known location on the build-time
filesystem, configurable via environment variables.
- If present, this newly-defined metadata format supersedes the existing
dependency metadata in `buildpack.toml`.
- We will add a dependency version validation section to `buildpack.toml`
metadata, this can be used to state that a buildpack version only supports
certain ranges of a given tool, such as Java `11.*` or Node.js `16.*`.
- Metadata can be provided by anyone. A user can add custom metadata, or source
from a third party project. Paketo will provide an official set of metadata
against which we will test the Paketo buildpacks. It will be distributed via
images in an image registry (Docker Hub).
- Buildpacks do not care how dependency metadata is distributed, that is a
separate concern, instead, they just read metadata from a specified location.
Meta-data could be provided by another buildpack, the builder, the platform
or even the user.
- The actual dependencies are accessed via the metadata and that can happen
over any protocol (HTTPS/SFTP/FILE) and be distributed in any format
(archive/image).
- Dependency metadata will **not** be removed from `buildpack.toml`, this will
be a matter of a separate RFC once all buildpacks have adopted this RFC.
At a high level:
- We will define `buildpack.toml` metadata that allows buildpacks to express dependencies by name and version
E.g. "We need a Java Virtual Machine in version 11.*" or "We need a Node.js runtime in version 16.*"
- We will define a metadata format that includes the metadata fields currently
present in `buildpack.toml` and that allows to specify dependencies by name and version.
- We will define a way to discover metadata at build time, that allows anyone to provide it
- We will adapt our tools and processes to provide dependency metadata according to the new formats

Comment on lines +94 to +105
## Metadata Format
The metadata presented to a buildpack will be structured as a directory of
dependencies.

Like it does now, each dependency will have an id but unlike the present
situation, the id is namespaced. An id is composed of an organization name and
a dependency name. It follows the [reverse domain name notation](https://en.wikipedia.org/wiki/Reverse_domain_name_notation) and the
dependency name is defined as the final item, so `com.example.dep-a` would have
an organization of `com.example` and a dependency of `dep-a`. It is case
insensitive, so `dep-a` is no different than `Dep-A`. This is to reduce the
possibility of [typosquatting](https://en.wikipedia.org/wiki/Typosquatting).
Allowed characters are [the same as for a valid hostname](https://en.wikipedia.org/wiki/Hostname#Syntax).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Metadata Format
The metadata presented to a buildpack will be structured as a directory of
dependencies.
Like it does now, each dependency will have an id but unlike the present
situation, the id is namespaced. An id is composed of an organization name and
a dependency name. It follows the [reverse domain name notation](https://en.wikipedia.org/wiki/Reverse_domain_name_notation) and the
dependency name is defined as the final item, so `com.example.dep-a` would have
an organization of `com.example` and a dependency of `dep-a`. It is case
insensitive, so `dep-a` is no different than `Dep-A`. This is to reduce the
possibility of [typosquatting](https://en.wikipedia.org/wiki/Typosquatting).
Allowed characters are [the same as for a valid hostname](https://en.wikipedia.org/wiki/Hostname#Syntax).
How this will be done should be based on the outcome of explorations and will need further RFCs to pin down once we know more.
As dependencies escape the local scope of individual buildpacks, we will need to make sure to disambiguate dependency names. One possibility that has precendent in our industry is the use of [reverse domain name notation](https://en.wikipedia.org/wiki/Reverse_domain_name_notation) for namespacing and restricting the character set to [valid hostname](https://en.wikipedia.org/wiki/Hostname#Syntax) to prevent [typosquatting](https://en.wikipedia.org/wiki/Typosquatting).
There is a similar precendent in the industry for version matching following [semver](https://semver.org/). However, we know already that some dependencies do not follow semver, so a potential fallback to regex or some kind of free form version matching is very likely.
## Metadata Format
The metadata presented to a buildpack will be structured as a directory of
dependencies.

Comment on lines +94 to +160
## Metadata Format
The metadata presented to a buildpack will be structured as a directory of
dependencies.

Like it does now, each dependency will have an id but unlike the present
situation, the id is namespaced. An id is composed of an organization name and
a dependency name. It follows the [reverse domain name notation](https://en.wikipedia.org/wiki/Reverse_domain_name_notation) and the
dependency name is defined as the final item, so `com.example.dep-a` would have
an organization of `com.example` and a dependency of `dep-a`. It is case
insensitive, so `dep-a` is no different than `Dep-A`. This is to reduce the
possibility of [typosquatting](https://en.wikipedia.org/wiki/Typosquatting).
Allowed characters are [the same as for a valid hostname](https://en.wikipedia.org/wiki/Hostname#Syntax).

The directory structure will contain a folder for each dot-separated segment of
the dependency’s organization name and in the lowest level directory there will
be a file named after the dependency name with the extension `toml`, this is
because the metadata file will be TOML format. For example, with
`com.example.dep-a`, there would be the following folder structure:

```
com
└── example
└── dep-a.toml
```

As mentioned previously, each individual metadata file has a file name
consisting of the dependency name with an extension of `toml`. The internal
format consists of a single table called `versions` which is an array of tables
containing all of the versions for that particular dependency. Each version
entry requires a `uri`, `version`, `checksum`, `arch`, `os`, and `license`. It
may also have `name`, `purl`, `strip-components`, and `cpes` although these are
optional. The `cpes` entry is an array of strings identifying all of the CPEs
for that dependency. The `license` is itself an array of tables containing
`type` and `uri` of the license for the dependency.

For example:
```toml
[[versions]]
cpes = [ "cpe:2.3:a:apache:maven:3.8.6:*:*:*:*:*:*:*" ]
name = "Apache Maven"
purl = "pkg:generic/[email protected]"
checksum = "sha256:c7047a48deb626abf26f71ab3643d296db9b1e67f1faa7d988637deac876b5a9"
arch = "x86_64"
os = "linux"
distro = "ubuntu-18.04"
uri = "https://repo1.maven.org/maven2/org/apache/maven/apache-maven/3.8.6/apache-maven-3.8.6-bin.tar.gz"
version = "3.8.6"
strip-components = 1

[[versions.licenses]]
type = "Apache-2.0"
uri = "https://www.apache.org/licenses/"

[[versions]]
cpes = [ "cpe:2.3:a:apache:mvnd:0.7.1:*:*:*:*:*:*:*" ]
name = "Apache Maven Daemon"
purl = "pkg:generic/[email protected]"
chekcsum = "sha256:ac0b276d4d7472d042ddaf3ad46170e5fcb9350981af91af6c5c13e602a07393"
arch = "x86_64"
os = "linux"
uri = "https://github.com/apache/maven-mvnd/releases/download/0.7.1/mvnd-0.7.1-linux-amd64.zip"
version = "0.7.1"

[[versions.licenses]]
type = "Apache-2.0"
uri = "https://www.apache.org/licenses/"
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we should go that far into the details here. We need unambiguous dependencies and with potentially many providers (at least Paketo providing batteries included Builders, but users taking control of specific dependencies themselves) unambiguous dependency providers.

Any details of this would imho need exploration and should hence not be defined in this initial RFC.

Suggested change
## Metadata Format
The metadata presented to a buildpack will be structured as a directory of
dependencies.
Like it does now, each dependency will have an id but unlike the present
situation, the id is namespaced. An id is composed of an organization name and
a dependency name. It follows the [reverse domain name notation](https://en.wikipedia.org/wiki/Reverse_domain_name_notation) and the
dependency name is defined as the final item, so `com.example.dep-a` would have
an organization of `com.example` and a dependency of `dep-a`. It is case
insensitive, so `dep-a` is no different than `Dep-A`. This is to reduce the
possibility of [typosquatting](https://en.wikipedia.org/wiki/Typosquatting).
Allowed characters are [the same as for a valid hostname](https://en.wikipedia.org/wiki/Hostname#Syntax).
The directory structure will contain a folder for each dot-separated segment of
the dependency’s organization name and in the lowest level directory there will
be a file named after the dependency name with the extension `toml`, this is
because the metadata file will be TOML format. For example, with
`com.example.dep-a`, there would be the following folder structure:
```
com
└── example
└── dep-a.toml
```
As mentioned previously, each individual metadata file has a file name
consisting of the dependency name with an extension of `toml`. The internal
format consists of a single table called `versions` which is an array of tables
containing all of the versions for that particular dependency. Each version
entry requires a `uri`, `version`, `checksum`, `arch`, `os`, and `license`. It
may also have `name`, `purl`, `strip-components`, and `cpes` although these are
optional. The `cpes` entry is an array of strings identifying all of the CPEs
for that dependency. The `license` is itself an array of tables containing
`type` and `uri` of the license for the dependency.
For example:
```toml
[[versions]]
cpes = [ "cpe:2.3:a:apache:maven:3.8.6:*:*:*:*:*:*:*" ]
name = "Apache Maven"
purl = "pkg:generic/[email protected]"
checksum = "sha256:c7047a48deb626abf26f71ab3643d296db9b1e67f1faa7d988637deac876b5a9"
arch = "x86_64"
os = "linux"
distro = "ubuntu-18.04"
uri = "https://repo1.maven.org/maven2/org/apache/maven/apache-maven/3.8.6/apache-maven-3.8.6-bin.tar.gz"
version = "3.8.6"
strip-components = 1
[[versions.licenses]]
type = "Apache-2.0"
uri = "https://www.apache.org/licenses/"
[[versions]]
cpes = [ "cpe:2.3:a:apache:mvnd:0.7.1:*:*:*:*:*:*:*" ]
name = "Apache Maven Daemon"
purl = "pkg:generic/[email protected]"
chekcsum = "sha256:ac0b276d4d7472d042ddaf3ad46170e5fcb9350981af91af6c5c13e602a07393"
arch = "x86_64"
os = "linux"
uri = "https://github.com/apache/maven-mvnd/releases/download/0.7.1/mvnd-0.7.1-linux-amd64.zip"
version = "0.7.1"
[[versions.licenses]]
type = "Apache-2.0"
uri = "https://www.apache.org/licenses/"
```

Copy link
Contributor Author

@ForestEckhardt ForestEckhardt Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I am ok with some push back on the format however for the most part this mirrors our existing format that exists in the buildpack.toml as it stands today so I am curious what the starting point of an investigation for a format would look like and what you see as the problem with this format? I think that we need some base standard to be defined in order for this to be used externally so I am curious what your thoughts are.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the most part this mirrors our existing format that exists in the buildpack.toml as it stands today so I am curious what the starting point of an investigation for a format would look like

I guess I'va just taken the existing format for granted. I.e. we sure need most - if not all - of the properties currently maintained in the buildpack.toml and just move them somewhere else. What I am sceptical about is the concrete format proposed, i.e. one file per dependency and a reverse domain name folder structure.

I could see use cases that would benefit from being able to ship a few dependencies' metadata in a single file for example. Like, whenever the end user specify something - enforcing structure and separation could hinder ease of use.

Comment on lines +161 to +213
## Dependency Validation
In `buildpack.toml` we will add a section to metadata for specifying dependency
validation parameters. This is a way that buildpacks can state that they do or
do not support certain versions of dependencies.

A buildpack does not need to include this section, it is optional. If included,
the buildpack and libraries like `libpak` and `packit` may use the information
to fail if dependency versions are requested by a user that might cause
problems for the buildpack. It is the buildpacks responsibility to process the
validations and react to them, whether that be warning the user or even
failing.

The format for this metadata is such that you have an array of tables called
`validations`. Each table in the array contains the dependency id, which
follows the format of a dependency id as outlined in the [Metadata
Format](#metadata-format) section. In addition, it includes an array of strings
called `supported` which contains a list of [semver](https://semver.org/)
ranges that indicate what is supported by that buildpack. Optionally it can
contain a `type` which defaults to `semver`, but can be set to `regex`.

It is recommended that buildpack authors use semver as the type because
matching is generally simpler that way, however, if a dependency does not
follow semver you may use regular expressions to match the versions that are
compatible with the buildpack.

If any semver range or regular expression matches then it can be assumed that
the buildpack is compatible. If no range matches then it can be assumed that
the version is not compatible.

For example:
```toml
[[metadata.validations]]
dependency-id = "com.example.jre"
supported = [ "8.0.*", "11.0.*", "17.0.*" ]

[[metadata.validations]]
dependency-id = "com.example.nodejs"
supported = [ "^16.0", "^17.0", "^18.0" ]

[[metadata.validations]]
dependency-id = "com.example.tomcat"
supported = [ "8\.5\.\d+", "9\.0\.\d+", "10\.0\.\d+" ]
type = "regex"
```

A buildpack is encouraged to be as permissive as possible. This ensures that a
buildpack author won’t have to frequently update this metadata. This should be
balanced by buildpack authors to provide compatibility guarantees with the
tools required to run the buildpack and for the software it is installing to
run. Generally, we believe that most buildpacks will be compatible with the
major versions of software they presently support and that new major versions
of dependencies should be tested and validations should be expanded after
testing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we should go that far into the details here. We will need a way for buildpacks to unambigously require depndencies and assert some control over details like versions.

Any details of this would imho need exploration and should hence not be defined in this initial RFC.

Suggested change
## Dependency Validation
In `buildpack.toml` we will add a section to metadata for specifying dependency
validation parameters. This is a way that buildpacks can state that they do or
do not support certain versions of dependencies.
A buildpack does not need to include this section, it is optional. If included,
the buildpack and libraries like `libpak` and `packit` may use the information
to fail if dependency versions are requested by a user that might cause
problems for the buildpack. It is the buildpacks responsibility to process the
validations and react to them, whether that be warning the user or even
failing.
The format for this metadata is such that you have an array of tables called
`validations`. Each table in the array contains the dependency id, which
follows the format of a dependency id as outlined in the [Metadata
Format](#metadata-format) section. In addition, it includes an array of strings
called `supported` which contains a list of [semver](https://semver.org/)
ranges that indicate what is supported by that buildpack. Optionally it can
contain a `type` which defaults to `semver`, but can be set to `regex`.
It is recommended that buildpack authors use semver as the type because
matching is generally simpler that way, however, if a dependency does not
follow semver you may use regular expressions to match the versions that are
compatible with the buildpack.
If any semver range or regular expression matches then it can be assumed that
the buildpack is compatible. If no range matches then it can be assumed that
the version is not compatible.
For example:
```toml
[[metadata.validations]]
dependency-id = "com.example.jre"
supported = [ "8.0.*", "11.0.*", "17.0.*" ]
[[metadata.validations]]
dependency-id = "com.example.nodejs"
supported = [ "^16.0", "^17.0", "^18.0" ]
[[metadata.validations]]
dependency-id = "com.example.tomcat"
supported = [ "8\.5\.\d+", "9\.0\.\d+", "10\.0\.\d+" ]
type = "regex"
```
A buildpack is encouraged to be as permissive as possible. This ensures that a
buildpack author won’t have to frequently update this metadata. This should be
balanced by buildpack authors to provide compatibility guarantees with the
tools required to run the buildpack and for the software it is installing to
run. Generally, we believe that most buildpacks will be compatible with the
major versions of software they presently support and that new major versions
of dependencies should be tested and validations should be expanded after
testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with removing this from the overall RFC and implementing this as an ad hoc structure in Paketo on codifying it later if we see fit. I would be curious if anyone felt this had to be boiler plate for the RFC as it stands today?

Comment on lines +215 to +242
## Metadata Distribution
There is no prescribed method for distributing metadata. It could be done in a
variety of ways, including HTTPS/SFTP distributed archives, `rsync` of remote
directories, or even distributed as an image through an image registry.

For the Paketo project, this proposal suggests distributing metadata through
images in an image registry. This allows the project to use the existing image
registry to distribute the metadata. Image registries also have inherent
properties that help with security, like an image cannot be modified without
creating a new hash for the image and images can be signed (signing is out of
scope for this proposal). In addition, images are easily versioned such that
users can hold back updates to dependencies if desired and are easily cached.

The contents of the image will contain the directory structure defined in
[Metadata Format](#metadata-format). There should not be a top-level directory
added, so the root of the image should contain all of the directories created
as top-level organization names, like `com` or `org`. All of the metadata is to
be included in a single layer. Updates to metadata will require downloading the
entire layer again, however, it is a single layer and the size is expected to
be small so this should be very fast.

Further, this proposal suggests subdividing metadata images by project
sub-team. Each sub-team will be given a unique reverse domain name like
`io.paketo.java` and `io.paketo.utilities`. In this way, the project’s metadata
can be easily combined without having conflicts.

This allows users to pick and choose the metadata that’s relevant to their
needs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we should go that far into the details here. We will need to define how we plan to distribute metadata.
We might want to make a point that the language family maintainers should keep the responsibility and freedom to decide on dependency updates within their language family.

Any details of this would imho need exploration and should hence not be defined in this initial RFC.

Suggested change
## Metadata Distribution
There is no prescribed method for distributing metadata. It could be done in a
variety of ways, including HTTPS/SFTP distributed archives, `rsync` of remote
directories, or even distributed as an image through an image registry.
For the Paketo project, this proposal suggests distributing metadata through
images in an image registry. This allows the project to use the existing image
registry to distribute the metadata. Image registries also have inherent
properties that help with security, like an image cannot be modified without
creating a new hash for the image and images can be signed (signing is out of
scope for this proposal). In addition, images are easily versioned such that
users can hold back updates to dependencies if desired and are easily cached.
The contents of the image will contain the directory structure defined in
[Metadata Format](#metadata-format). There should not be a top-level directory
added, so the root of the image should contain all of the directories created
as top-level organization names, like `com` or `org`. All of the metadata is to
be included in a single layer. Updates to metadata will require downloading the
entire layer again, however, it is a single layer and the size is expected to
be small so this should be very fast.
Further, this proposal suggests subdividing metadata images by project
sub-team. Each sub-team will be given a unique reverse domain name like
`io.paketo.java` and `io.paketo.utilities`. In this way, the project’s metadata
can be easily combined without having conflicts.
This allows users to pick and choose the metadata that’s relevant to their
needs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this sentiment. As long as you are able to conform to the larger API then I think that users should be able to deliver the metadata as they see fit. We might want to either record or codify our methodology at some point in the future. Is there anyone that feels a distribution method should be chosen as part of this RFC?

Comment on lines +244 to +284
## Buildpack Dependency Metadata Interface
The interface between dependency metadata and a buildpack is simple. A single
directory of metadata will be presented to the buildpack. It will be presented
at the location specified by `BP_DEPENDENCY_METADATA`, which defaults to
`/platform/deps/metadata` (the intent is to pilot and try out this or possibly
other locations, eventually proposing an RFC with Cloud-Native buildpacks to
standardize the location).

Buildpacks do not care if there are multiple sources of metadata information,
however, this needs to be merged and presented to the buildpack as one single
directory. How that information is merged is outside of the scope of this
document, but the directory structure defined in [Metadata
Format](#metadata-format) guarantees that there will not be any duplicate
dependency ids.

This document does not specify how the dependency metadata folder should be
provided to buildpacks but here are some possibilities:

1. External. The metadata information can be managed outside of the buildpack
lifecycle. This allows for users to manually pull in the metadata they would
like when they would like it. The metadata can then be mapped into the build
container at the specified location using a volume mount to `pack build`.
1. Via buildpack. We could add a Paketo buildpack to the beginning of buildpack
order groups that is responsible for pulling metadata and overrides
`BP_DEPENDENCY_METADATA` to point to its layer. The buildpack can check for
dependency metadata updates when it runs.
1. Via builder. The builder could come with metadata included under
`/platform/deps/metadata`. This would likely require a builder with many
buildpacks to update very frequently though, so may not be the best idea for
the average case, however, it could be very useful as a way to distribute
dependencies in offline environments.
1. Via the platform. Platforms may choose to offer enhanced functionality to
more easily distribute dependencies. In the end, the platform just needs to
ensure that the dependency metadata is available to the buildpack in the
required location.

In addition to the flexibility of where the dependencies originate, this
proposal also provides flexibility in how those dependencies are managed such
as floating them so the latest versions are always available, pinning to a
specific set of dependencies or even pinning and including the dependencies
with the metadata.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we should go that far into the details here. We will need to define how dependency metadata gets injected to buildpacks and how any kind of disambiguation or precendence gets handled. I inclined to think that it is too early to guarantee that the buildpack can rely an the resolution haven been taken care of though, maybe packit and libpak are nice candidates to do the work.

Any details of this would imho need exploration and should hence not be defined in this initial RFC.

Suggested change
## Buildpack Dependency Metadata Interface
The interface between dependency metadata and a buildpack is simple. A single
directory of metadata will be presented to the buildpack. It will be presented
at the location specified by `BP_DEPENDENCY_METADATA`, which defaults to
`/platform/deps/metadata` (the intent is to pilot and try out this or possibly
other locations, eventually proposing an RFC with Cloud-Native buildpacks to
standardize the location).
Buildpacks do not care if there are multiple sources of metadata information,
however, this needs to be merged and presented to the buildpack as one single
directory. How that information is merged is outside of the scope of this
document, but the directory structure defined in [Metadata
Format](#metadata-format) guarantees that there will not be any duplicate
dependency ids.
This document does not specify how the dependency metadata folder should be
provided to buildpacks but here are some possibilities:
1. External. The metadata information can be managed outside of the buildpack
lifecycle. This allows for users to manually pull in the metadata they would
like when they would like it. The metadata can then be mapped into the build
container at the specified location using a volume mount to `pack build`.
1. Via buildpack. We could add a Paketo buildpack to the beginning of buildpack
order groups that is responsible for pulling metadata and overrides
`BP_DEPENDENCY_METADATA` to point to its layer. The buildpack can check for
dependency metadata updates when it runs.
1. Via builder. The builder could come with metadata included under
`/platform/deps/metadata`. This would likely require a builder with many
buildpacks to update very frequently though, so may not be the best idea for
the average case, however, it could be very useful as a way to distribute
dependencies in offline environments.
1. Via the platform. Platforms may choose to offer enhanced functionality to
more easily distribute dependencies. In the end, the platform just needs to
ensure that the dependency metadata is available to the buildpack in the
required location.
In addition to the flexibility of where the dependencies originate, this
proposal also provides flexibility in how those dependencies are managed such
as floating them so the latest versions are always available, pinning to a
specific set of dependencies or even pinning and including the dependencies
with the metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at a bare minimum in this RFC I would like to define some interface that would allow for the development of the decouple dependency metadata. I am not sure what exploration we would launch that would not be a competing interface to the one proposed. If you are looking for a competing interface or have one in mind then I think that we should have competing RFCs. I think that it would be fine to remove the potential delivery methods from this section but I think that the overall idea should remain.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be right with the competing interface. I feel like it would make sense to go for BP_DEPENDENCIES and /platform/deps to allow dependency metadata and assets to be distributed together (if assets get delivered together with metadata, i.e. offline/air gapped case).

Comment on lines +286 to +323
## Dependency Assets
This proposal does not impact how dependencies are being distributed and it
supports the current methods referring to remote locations, like upstream
projects, or using the Deps Server. The dependency metadata just needs to point
to the location from which the dependency should be fetched as is currently
being done with the buildpack.toml dependency metadata.

For builders, buildpacks, or platforms that would like to inject dependency
assets directly into the build container, perhaps to support offline builds, we
propose defining `BP_DEPENDENCY_ASSET_ROOT` which defaults to
`/platform/deps/assets` as the location where the actual dependency assets
should be located (again, the intent is to pilot and try out this or possibly
other locations, eventually proposing an RFC with Cloud-Native buildpacks to
standardize the location).

In this way, one could have dependency metadata that uses `file://` URLs to
refer to a stable location, i.e. `/platform/deps/assets/…`. These dependencies
could then be accessed by the buildpack directly, and the buildpack doesn’t
need to care how they were added to the container.

This document does not specify how the dependency asset folder should be
provided to buildpacks but here are some possibilities:

1. External. The dependency assets can be managed outside of the buildpack
lifecycle. This allows for users to manually pull in the dependency assets
they would like when they would like them. It also outlives the buildpack
lifecycle so assets could be shared across builds for greater caching
benefit. The assets can then be mapped into the build container at the
specified location using a volume mount to `pack build`.
1. Via builder. The builder could come with assets included under
`/platform/deps/assets`. This would likely require a builder with many
buildpacks to update very frequently though, so may not be the best idea for
the average case, however, it could be very useful as a way to distribute
dependencies in offline environments.
1. Via the platform. Platforms may choose to offer enhanced functionality to
more easily distribute dependencies. In the end, the platform just needs to
ensure that the dependency metadata is available to the buildpack in the
required location.
Copy link

@loewenstein loewenstein Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am inclined to think that dependency asset - if they are included - they should be easily allowed to be provided together with the metadata. More broadly speaking though I'd defer any definitions or decisions to after an exploration of e.g. dependency proxies or offline dependencies.

Suggested change
## Dependency Assets
This proposal does not impact how dependencies are being distributed and it
supports the current methods referring to remote locations, like upstream
projects, or using the Deps Server. The dependency metadata just needs to point
to the location from which the dependency should be fetched as is currently
being done with the buildpack.toml dependency metadata.
For builders, buildpacks, or platforms that would like to inject dependency
assets directly into the build container, perhaps to support offline builds, we
propose defining `BP_DEPENDENCY_ASSET_ROOT` which defaults to
`/platform/deps/assets` as the location where the actual dependency assets
should be located (again, the intent is to pilot and try out this or possibly
other locations, eventually proposing an RFC with Cloud-Native buildpacks to
standardize the location).
In this way, one could have dependency metadata that uses `file://` URLs to
refer to a stable location, i.e. `/platform/deps/assets/…`. These dependencies
could then be accessed by the buildpack directly, and the buildpack doesn’t
need to care how they were added to the container.
This document does not specify how the dependency asset folder should be
provided to buildpacks but here are some possibilities:
1. External. The dependency assets can be managed outside of the buildpack
lifecycle. This allows for users to manually pull in the dependency assets
they would like when they would like them. It also outlives the buildpack
lifecycle so assets could be shared across builds for greater caching
benefit. The assets can then be mapped into the build container at the
specified location using a volume mount to `pack build`.
1. Via builder. The builder could come with assets included under
`/platform/deps/assets`. This would likely require a builder with many
buildpacks to update very frequently though, so may not be the best idea for
the average case, however, it could be very useful as a way to distribute
dependencies in offline environments.
1. Via the platform. Platforms may choose to offer enhanced functionality to
more easily distribute dependencies. In the end, the platform just needs to
ensure that the dependency metadata is available to the buildpack in the
required location.
## Dependency Assets
This proposal does not impact how dependencies are being distributed and it
supports the current methods referring to remote locations, like upstream
projects, or using the Deps Server. The dependency metadata just needs to point
to the location from which the dependency should be fetched as is currently
being done with the buildpack.toml dependency metadata.
We foresee that for example builders, buildpacks, or platforms would like to inject dependency
assets directly into the build container, perhaps to support offline builds.
We expect this to be possible with dependency decoupling in place, but this is out-of-scope for this RFC.

ensure that the dependency metadata is available to the buildpack in the
required location.

## Implementation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should actually start with a couple of explorations of different use cases and clarify those details they I suggest we leave out of here.
Maybe we should have a list of explorations right here and invite all Paketo maintainers to have a look at there dependency handling and add explorations for their use cases.

Suggested change
## Implementation
## Implementation
### Exploration
We propose to start with a larger exploration phase in order to detail out the contracts between dependency installing buildpacks and dependency providers.
E.g.
- Could we have a single JVM buildpack and defer vendor variants to externally provided dependency metadata for e.g. Bellsoft, SapMachine, Microsoft, etc.?
- How can we keep multiple version lines of dependencies (Java 8, 11, 17, 20 and similar for Node)
- What dependency versioning schemes exist and is there a common pattern besides semver (e.g. regex) that fits all other cases?
- How do we handle precedence if a builder comes with batteries included, the platform adds dependencies and the user tries to override some of the provided ones?
- ...

@loewenstein
Copy link

I wasn't able to grasp the current PR with change suggestions any longer and maybe you are right @ForestEckhardt and some of my comments go beyond review and towards an alternative proposal. So I started sketching that in https://github.com/paketo-buildpacks/rfcs/blob/decouple-dependencies-alternative/text/0000-decouple-dependencies.md based on the branch used in here.

Happy to hear what you think of it.

cc @dmikusa

@dmikusa
Copy link
Contributor

dmikusa commented Jul 25, 2023

The proposal is intentionally abstract. How metadata and actual binaries become available to the buildpacks running in the container is meant to be open-ended so that we can support multiple platforms and use cases. I fully realize that makes it harder to follow.

I agree with @ForestEckhardt in that what we really need to specify is the metadata format, both the directory/file structure as well as the structure inside of the metadata files. The nebulous processes that run outside of the container to provide the metadata need to meet and coordinate with the buildpack code through this interface.

I'm simplifying a lot here, but I think most other things can be worked out later or through follow-up RFCs. Things like where the binaries live. That's really a concern between a particular buildpack and a particular metadata provider (or possibly between language family teams). If patterns eventually emerge, we can turn them into RFCs so we have consistent behavior.

For example. We could consider pack and possibly some manual work (or a tool to be defined later) to be the external metadata provider. It has created the prescribed directory structure and metadata files. It then volume mounts that into the container at the prescribed location. The buildpack knows where to find those files because of this RFC and how to read them.

I think what happens next is out of scope for this RFC, because otherwise, it's going to make this RFC really, really big and hard to implement and it's already pretty large and complicated.

There are a number of questions though, carrying on from the example, what happens next? Presumably, the buildpack will install some dependencies, but what dependencies does it install? What version/how does it select a version? What dependency formats does it support (tar/rpm/deb/etc..)? What transports does it support? etc...

I'd like to leave this type of stuff out of this RFC, so that we can get the interface together first. Then we can experiment with things and see what ends up being useful and common across Paketo buildpacks. For example, maybe down the road we decide that we want a generic Paketo buildpack that can install any requested dependency by an id or name from local file systems/HTTP and understands tar archives, and uses semver. To me, that's hard to know right now and is big enough to make its own RFC.

@loewenstein
Copy link

We do not have any nebulous tool creating dependency metadata yet though but will rather develop them as part of this endeavor, don't we?

Hence, I believe this RFC is indeed too specific. I would prefer to first agree on "yes, we see the potential benefits of the decoupling and should pursue this" and then actually figure out the details while workings on it.

I'd start with a minimal set, one buildpack per buildpack library in use in the Paketo project and probably a human as the first nebulous tool to provide metadata and assets to the buildpack.

@dmikusa
Copy link
Contributor

dmikusa commented Jul 25, 2023

We do not have any nebulous tool creating dependency metadata yet though but will rather develop them as part of this endeavor, don't we?

Nothing yet. From the Paketo standpoint, I'd expect to have tooling in this area for CI, similar to jam and other tools we have now that manage dependencies.

For other use cases, I don't really expect Pakto to provide those. Maybe we look at providing a tool for local dev, since that's a common use case? Maybe we let projects like Spring that implement buildpacks integration provide that?

Hence, I believe this RFC is indeed too specific. I would prefer to first agree on "yes, we see the potential benefits of the decoupling and should pursue this" and then actually figure out the details while workings on it.

I definitely don't want this to be overly specific, but at the same time, I don't think we need an RFC just to agree on pursuing this effort. I'd like to think that by now someone would have voiced a counter-opinion if it is not a thing we should pursue. So I'm hoping we can have an RFC that gives us some minimal foundation on which we can start building things.

I'd start with a minimal set, one buildpack per buildpack library in use in the Paketo project and probably a human as the first nebulous tool to provide metadata and assets to the buildpack.

Not sure I follow you here. Sorry.

@loewenstein
Copy link

Alright, I do believe that the lifecycle of metadata and assets, if assets are to be provided, should be coupled not separated.

I do believe that dependency bundles (the mix of metadata and potentially assets of some set of dependencies) should scale from small to large. I.e. a user should easily be able to provide a single dependency as well as the language family maintainers should be able to provide larger ones. The former would benefit from a single metadata file, while the latter could be achieved with e.g. allowing a file system structure to easily add and group namespaces.

If the same dependencies come from different dependency bundles, we need a way to figure out precedence.

I would however like to avoid putting these rough thoughts into formal specification just yet, but rather align on goals to reach.

@loewenstein
Copy link

I'd start with a minimal set, one buildpack per buildpack library in use in the Paketo project and probably a human as the first nebulous tool to provide metadata and assets to the buildpack.

Not sure I follow you here. Sorry.

What I meant was that we have all things under control, so there is little need to specify details upfront instead of just diving into it and discover the specifics we need as part of the journey.

@dmikusa
Copy link
Contributor

dmikusa commented Jul 25, 2023

@loewenstein @ForestEckhardt - Let's move this discussion to Slack, and get a little more consensus. I think we're expecting too much out of GH Issues here to keep everything in order :) I've invited you both to a channel. It's public, so others interested are welcome, I just didn't want to spam everyone with invites so I only started with us.

@ForestEckhardt
Copy link
Contributor Author

This RFC has been drafted to indicate that a component of it if being worked on in another RFC and that this is not my main focus. This RFC will be undrafted when it is the main focus again.

- Metadata can be provided by anyone. A user can add custom metadata, or source
from a third party project. Paketo will provide an official set of metadata
against which we will test the Paketo buildpacks. It will be distributed via
images in an image registry (Docker Hub).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is crucial that Paketo continues to offer tooling to allow offline buildpack packaging as it does right now but with this official set of metadata instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants