RFC: Decouple Dependencies from the Buildpacks #287

ForestEckhardt · 2023-04-25T19:48:35Z

text/0000-decouple-dependencies.md

robdimsdale · 2023-04-26T09:25:23Z

text/0000-decouple-dependencies.md

+
+For builders, buildpacks, or platforms that would like to inject dependency
+assets directly into the build container, perhaps to support offline builds, we
+propose defining `BP_DEPENDENCY_BINARIES` which defaults to


I think this environment variable name should more clearly indicate that it is referring to a filepath prefix. Maybe something BP_DEPENDENCY_BIN_PATH?

Maybe BP_DEPENDENCY_ASSET_ROOT or _PATH?

Yeah, both of those work as well. Do we have examples of prior art for naming env vars for directories - both in Paketo and in the upstream Buildpacks projects?

I am trying out BP_DEPENDENCY_ASSET_ROOT for now.

robdimsdale · 2023-04-26T09:29:43Z

text/0000-decouple-dependencies.md

+other locations, eventually proposing an RFC with Cloud-Native buildpacks to
+standardize the location).
+
+In this way, one could have dependency metadata that uses `file://` URLs to


I'm confused by this statement. I would have thought that the benefit of using this configurable path prefix is so that the dependency metadata doesn't have to include the full path.

I was thinking it would work like:

[[versions]] ... uri = file://com/paketo/python/pip-1.2.3.tgz

which gets mapped to:

$BP_DEPENDENCY_BINARIES/com/paketo/python/pip-1.2.3.tgz

i.e.

/platform/deps/assets/com/paketo/python/pip-1.2.3.tgz

As it currently reads, the proposal sounds like the metadata toml file has to include the full path, which makes me question where the environment variable (BP_DEPENDENCY_BINARIES) is being used.

I'm sure I'm missing something, so maybe an example here would help?

At present, there no magic with the root.

If you are injecting metadata and assets the you need to coordinate. If you are putting assets at /foo then Metadata must be set with paths to that location.

It is very basic. Easy to implement.

@ForestEckhardt And I were talking about this. It's hard to tell how this will play out so we're thinking that we keep things simple now.

If we need to add in a "base" for metadata urls we can do that without breaking things later. This could also be something that tooling manages so it's less of an issue that it might appear.

Obviously if someone feels strongly about needing more in this part of the RFC we can do that. Just wanted to start minimally and work up.

I'd say that we should rather exclude the second environment variable, i.e. BP_DEPENDENCY_BINARIES, in that case.

I think I agree with @loewenstein - if the metadata has to be full paths then I'm missing the value of BP_DEPENDENCY_BINARIES. How would that environment variable be consumed?

I see what you're saying. That seems unnecessary in this context. I'll likely remove this, but let me think about this a little more just to make sure I'm not forgetting something.

robdimsdale · 2023-04-26T09:35:08Z

text/0000-decouple-dependencies.md

+- The actual dependencies are accessed via the metadata and that can happen
+  over any protocol (HTTPS/SFTP/FILE) and be distributed in any format
+  (archive/image).
+- Dependency metadata will be removed from `buildpack.toml`


I think this RFC should not propose removing dependency metadata from buildpack.toml. I think it should describe the ordering of looking for metadata in the new place, followed by the existing buildpack.toml. I get that this is an implementation concern of the buildpack, but I think it's worth calling out in the RFC.

I say this because I think it's valuable to provide the ability to fallback to the current mechanism for some period while we incrementally build out the new system and uncover any issues, and I think we should introduce the idea of removing buildpack.toml metadata in a separate RFC once all buildpacks have an established model for providing dependencies via the new mechanism.

I see that we call out the idea of not removing the current way of doing things in multiple places in the RFC, which is good, so i think probably just need to update this summary bullet point to match the rest of the RFC.

+1 removal of the metadata for dependencies in the buildpacks is not part of this RFC. Probably a separate RFC.

Suggested change

- Dependency metadata will be removed from `buildpack.toml`

- Dependency metadata will **not** be removed from `buildpack.toml`, this will be a matter of a separate RFC once all buildpacks have adopted this RFC.

FWIW I am not sure if fallback is the best approach. We should at least consider if it's best to error out if for example the buildpack and the user provide metadata - this might well be a conscious choice of the buildpack team to not (yet) support this RFC.

FWIW I am not sure if fallback is the best approach. We should at least consider if it's best to error out if for example the buildpack and the user provide metadata - this might well be a conscious choice of the buildpack team to not (yet) support this RFC.

That's a fair point. I think the situation I'm worried about is that without fallback we have to make a big-bang switchover from one system to another. One example of this is coordinating the removal of a dependency's metadata from a buildpack (e.g. cpython) with simultaneously adding it to the "dependency" buildpack in the language family. Without fallback these changes have to literally be simultaneous, otherwise you break things either way - no metadata is a failure mode, as is two copies of metadata. I think having a fallback provides a smooth transition.

Maybe I'm over-indexing on the buildpack-author's experience, but it seems unnecessary to make our lives harder when there is no cost to the end user.

No problem at all with the concept of meta/top-level RFC. It's what we did with the dependencies rewrite and it worked well.

I am a little confused on the direction we want here. Do we want to add a trigger mechanism as part of THIS RFC or do we want to hold out for an RFC that is more specific on the implementation? It is sounding like we want to add a mechanism to opt-in and then in a subsequent RFC it will be changed to be an opt-out trigger. Is that correct?

I think it would be helpful to think about this RFC as a "specification" or "API" for how dependency-providing buildpacks (like go-dist, cpython, etc) will be able to locate metadata and dependencies. I think there's a separate RFC to be created to discuss the default implementation of this specification (spoiler alert: I'd advocate for a buildpack in each language family that provides this metadata).

So, I think interface constructs like BP_EXTERNAL_DEPENDENCIES_DISABLED should probably be defined in this RFC. If they're not, I think we run the risk of muddling up the specification and the chosen, default, implementation.

Ok while double checking the RFC I think that this use case is covered by BP_EXTERNAL_METADATA_ENABLED. Please check out the definition of this environment variable in the implementation section and see if that is what you are looking for.

Ah, right. I forgot about that. I think that covers my concern.

text/0000-decouple-dependencies.md

robdimsdale · 2023-04-26T09:41:35Z

text/0000-decouple-dependencies.md

+merged into the official Paketo documentation.
+
+### Buildpack Migration Process
+Once a language family has added support for the new metadata format and


What does it mean for a language family to have support for this new metadata? Language-family buildpacks don't have dependencies, so is the intention here to say: "all buildpacks in a language family that have dependencies have added support for the new mechanism"?

This section is to just say that language families have the final say on depreciation. The RFC provides guidance but doesn't mandate anything.

We wanted to cover this point, but we realize that it will likely vary from team to team so we wanted to let teams have the final say on things.

I see. Would it be fair to say something like: "once a language-family maintainer group is comfortable that all buildpacks have migrated..."

Updated some of the language around here please feel free to take a look!

text/0000-decouple-dependencies.md

loewenstein · 2023-04-27T06:45:09Z

text/0000-decouple-dependencies.md

+com
+└── example
+    └── dep-a.toml


com └── example └── dep-a └── metadata.toml

Is there a particular reason to treat namespace and dependency name differently? We could just add one more node to the tree representing the dependency and have a fixed name for the file, e.g. metadata.toml or dependency.toml.
It doesn't mater much, but would leave room to even extend metadata with additional toml files if we'd ever see a reason to do so.

Alternatively, if we go for a fixed layout with dep-a.toml as the leafs, one potential benefit could be to allow the file system structure to be optional. I.e. we could define

[[com]] [[com.example]] [[com.example."dep-a"]] [[com.example."dep-a".versions]] name="dep-a" [[com.example."dep-b"]] [[com.example."dep-b".versions]] name="dep-b"

to keep things simple in case of a few dependencies and allow the filesystem structure as convenience (nodes in the filesystem leading to TOML table array prefixes) in a parser for dependencies.

I like this extensibility. I don't think it costs anything to adopt this over what's proposed in the RFC currently. But I could be missing something.

You first proposal is interesting because it would allow us the have N length domain names with very simple backend logic. If you want com.example.dep-a or com.example.group.subgroup.dep-a you just need to follow the directory path and then grab the metadata file @dmikusa is there any lose in security by doing this?

As for the second proposal I am not sure that I entirely understand the benefit that you are trying to layout. What would be the advantage allowing the filesystem structure to be optional? I feel like it would make things more difficult with you wanted to combine dependency metadata packages together.

I think it would make small dependency packages easier to author, think just a small file if it's just a single or two dependencies.

Imagine a user want to consume all dependencies from Paketo, but one of the updates breaks their own code and they need to temporarily substitute metadata (and assets) with an older version.

FWIW in the end the dependencies are just a tree with namespace nodes and a named leaf with metadata.

Merging should be the same, no matter how the tree is represented - even a mixed representation shouldn't make much of a difference.

Is there a particular reason to treat namespace and dependency name differently? We could just add one more node to the tree representing the dependency and have a fixed name for the file, e.g. metadata.toml or dependency.toml.

My thought was that it would keep the directory structure a little flatter, but I don't have a problem with doing it this way. I get what you're saying in terms of flexibility to add more files. I was thinking additional metadata would go in the dependency.toml file, but perhaps there is something else best represented outside of the file. I'm fine with that change, if others agree.

Alternatively, if we go for a fixed layout with dep-a.toml as the leafs, one potential benefit could be to allow the file system structure to be optional.

I can see where you're going, but I'd say in the interest of keeping things MVP let's try things out without this addition and see how it goes first. I suspect there will be tooling to help manage the metadata, so hopefully, it's not too much of a burden. If it turns out to be, we can come back and look at ways to address.

loewenstein · 2023-04-27T06:49:19Z

text/0000-decouple-dependencies.md

+As mentioned previously, each individual metadata file has a file name
+consisting of the dependency name with an extension of `toml`. The internal
+format consists of a single table called `versions` which is an array of tables
+containing all of the versions for that particular dependency. Each version
+entry requires a `uri`, `version`, `checksum`, `arch`, `os`, and `license`. It
+may also have `name`, `purl`, `strip-components`, and `cpes` although these are
+optional. The `cpes` entry is an array of strings identifying all of the CPEs
+for that dependency. The `license` is itself an array of tables containing
+`type` and `uri` of the license for the dependency.


Are these the same mandatory and/or optional values they currently are for the buildpack.toml? If they differ, like this RFC introducing additional ones like arch and os or if there are fields that changed from mandatory to optional or vice versa this would be worth pointing out explicitly.

Agree that it's worth being explicitly. FWIW I was assuming this RFC wouldn't change the syntax/semantics of the existing metadata - it just allows it to be located at a new location on the filesystem.

Yeah I don't think that any default values with change. I think that introduction of things like arch and os are future proofing us for the removal of stacks and introduction of ARM capable buildpacks.

Yes, 100% @ForestEckhardt. Should be the same metadata as now, but just trying to get ahead of the stack removal stuff.

loewenstein · 2023-04-27T06:56:26Z

text/0000-decouple-dependencies.md

+A buildpack does not need to include this section, it is optional. If included,
+the buildpack and libraries like `libpak` and `packit` may use the information
+to fail if dependency versions are requested by a user that might cause
+problems for the buildpack. It is the buildpacks responsibility to process the
+validations and react to them, whether that be warning the user or even
+failing.


Is that to say libpak and packit could provide a validate function, but buildpacks are responsible to call it, iterate over the result and decide on log output and success vs. failure?

That seems like a pragmatic way to facilitate buildpacks providing a consistent, helpful error message when the metadata/dependencies are incompatible with the buildpack.

Because this validation section is staying in the metadata section of buildpack.toml it will continue to be Paketo specific. Therefore Paketo libraries would be responsible for respecting this metadata but it will not become an explicit part of the outward facing interface of the decouple dependency metadata (although users creating their own buildpacks may want to leverage that Paketo specific API)

+1 to @ForestEckhardt and @robdimsdale - I don't think every buildpack will need to do this, but if buildpack authors want more control they can do this. This was really added for authors to give allow them to have some control over what dependencies are acceptable. To avoid the questions like, why doesn't your buildpack work with super.old.version of dependency ABC?

loewenstein · 2023-04-27T08:23:17Z

text/0000-decouple-dependencies.md

+Format](#metadata-format) section. In addition, it includes an array of strings
+called `supported` which contains a list of [semver](https://semver.org/)
+ranges that indicate what is supported by that buildpack. Optionally it can


So, for now, validations are only about versions? Probably an edge case, but the Java language family would benefit from having all Java vendors under a single namespace, wouldn't it? Has it been considered to add different validations in the future? If so, should this RFC state this somehow?

I'm not as familiar with the Java ecosystem, but it seems to be that if you have something specific in mind you could propose it as an addition to this RFC, but otherwise it seems reasonable to defer that for a potential future RFC.

So there is a discussion of sort of sub-team group domain io.paketo.java.(dep) for example that would allow us to group similar dependency together and create useful dependency metadata subset packages. I think this might solve the problem that you are talking about be I am not 100% sure.

My quest was if io.paketo.java.* would be about names for dependencies or for a separate packaging mechanism for dependencies...

I.e. I would expect the Paketo apache-tomcat buildpack to require a dependency named org.apache.tomcat rather than io.paketo.java.tomcat.

@loewenstein I think what you're talking about, how a buildpack locates a dependency, is not specified in this RFC. I think that is something we want to start simple with and build out more advanced usage as necessary.

My initial thought was that a buildpack would probably look for a particular namespace and dependency id. We could perhaps get more sophisticate in the future though, looking for dependency ids across namespaces or introducing tags or labels or something else for search and location of dependencies. Initially, I think simpler is better though.

If it helps, we can add something about how a buildpack could potentially locate dependencies.

loewenstein · 2023-04-27T08:27:34Z

text/0000-decouple-dependencies.md

+Further, this proposal suggests subdividing metadata images by project
+sub-team. Each sub-team will be given a unique reverse domain name like
+`io.paketo.java` and `io.paketo.utilities`. In this way, the project’s metadata
+can be easily combined without having conflicts.


Is this reverse domain name about the metadata image names or the file system structure for dependencies?

I read it as filesystem layout - so that multiple metadata sources are guaranteed not to clash on the filesystem. I don't think the image name needs to be specified here, as this RFC doesn't require images to be the distribution mechanism of choice for dependencies.

I was wondering about io.paketo in the context of java in particular. Imho the Java buildpacks should not look up Java VM dependencies in a Paketo namespace, but rather something like org.openjdk or something like that. Similarly, it should be org.apache.tomcat not io.paketo.java.tomcat, or at least I would like to think.

I think that seems reasonable. I guess it comes down to how do you want to ensure that dependencies don't step on each other toes. Language-maintainers was called out in the RFC and seems like an obvious choice. But I guess pre-existing reverse domain name layout (which is common in the java ecosystem) seems reasonable too.

My understanding (and @dmikusa please correct me if I have the wrong idea here). Was something to the affect of io.paketo.java.tomcat this would allow us to group similar dependencies together in one package more easily.

Why wouldn't we use the reverse domain names fitting to the upstream dependencies - at least where we don't compile them ourselves, there's nothing Paketo specific to the dependencies, or is there?

Regarding the packaging, if we allow the tree to be represented in a single toml file, grouping similar dependencies should be easy as well.

+1, I was also wondering why we would have a paketo-scoped domain for metadata for dependencies that we don't compile

The intent is to group by dependency provider. If an upstream project published buildpack dependency metadata, then they could use their own namespace. We publish this metadata, so it goes under io.paketo....

The only place this falls down a bit is when you want to override dependencies for Paketo buildpacks, then you need to republish something under the paketo.io namespace, at least initially (see my other comments on how buildpacks can find dependencies). I think that's OK for MVP though, and that there's room here to build and do more sophisticated things so that com.sap namespace can publish and override dependencies published by io.paketo without having to change the buildpacks.

loewenstein · 2023-04-27T08:30:20Z

text/0000-decouple-dependencies.md

+The directory structure will contain a folder for each dot-separated segment of
+the dependency’s organization name and in the lowest level directory there will


Having read further, if we plan to support multiple dependency providers - which we should - wouldn't this get quite difficult to accomplish? Like having too different dependency providers and two different corresponding metadata images, both providing dependencies in the TLD com - how would the metadata from both be mounted?

I would expect that if you provided metadata via multiple images that the container runtime would handle the overlaying of filesystems. I expect that the same would be true when using platform volume mounting, but I'm not familiar enough with the platform spec to know if that's a reasonable assumption.

I don't think that overlaying volume mounts is a given, but I definitely share the lack of knowledge to be sure.

This rfc seems to explicitly rely on someone to produce a single source - hence the mentioning of tooling that can take multiple sources and merge it into one. I would think though that this is less dynamic, similar to how one could produce custom stacks and builders - there just quite some effort and automation involved in it.

I see that. I'm assuming that for the default case, there will be a single source per Paketo language family. I.e. the Java buildpack maintainers would take the existing dependencies out of the existing buildpacks and move them to a single source that they create and maintain.

Flipping this around, what other system would you propose? We could use guids for dependency layouts, which (effectively) guarantees we won't have filesystem clashes but comes at the cost of readability on the filesystem.

One idea could be to introduce a fs node for the dependency sources, like /platform/dependencies/metadata/io.paketo/org/apache/tomcat... But that's just a rough thought - will look into this tomorrow.

Ok, let us know if you encounter any issues with that exploration.

Regardless, I think this RFC doesn't have to worry about the mechanism by which the filesystem is created and populated - I think it's out of scope. I think we just have to agree on the filesystem layout and who gets ownership of what level in the filesystem hierarchy.

@ForestEckhardt this could as well form a compromise for the above. The Paketo Java subteam could deliver a io.paketo.java dependency bundle that then containes metadata for - amongst others - org.apache.tomcat. WDYT?

Although, this might make it more difficult, to define precedence and for example allow a user to override the Tomcat dependency metadata provided by Paketo.

@loewenstein I think we're roughly on the same page, but I think the RFC could probably explain this better.

Like it does now, each dependency will have an id but unlike the present
situation, the id is namespaced. An id is composed of an organization name and
a dependency name.

The namespacing I'd thought about was by buildpack or possibly by buildpack team, not by the dependency project's name, so a more realistic example would be:

/platform/dependencies/metadata/io | --- paketo | --- java | --- apache-tomcat.toml ....

The reason I went with top-level directories is that most of the container solutions for injecting files, like volume mounts, assume that there isn't directory overlap (i.e. you can't volume mount over top of something, layer overlays work similarly, in this case you can overlay stuff but then you mask the files in lower layers which is confusing so its best to not do that). Anyway, having the namespace should allow us to easily split up metadata for different buildpack teams or buildpacks and just mount it into different namespaces.

I'm sure there's more we can do with this, but we should try to think MVC here & just ensure that the building blocks are in place so we can do more on top of this structure.

loewenstein · 2023-04-27T12:08:35Z

text/0000-decouple-dependencies.md

+at the location specified by `BP_DEPENDENCY_METADATA`, which defaults to
+`/platform/deps/metadata` (the intent is to pilot and try out this or possibly


Should we leave out defaulting at this stage, i.e. make BP_DEPENDENCY_METADATA a mandatory field? The four examples below should work without a default, shouldn't they?

I'd rather keep it as a default because then the user doesn't have to provide it at build time. I might be wrong, but I don't think you could set the env var in an upstream buildpack because the env vars do not propagate during the Detect phase, and generally the downstream buildpack would have to know where the metadata is during Detect as well as Build.

Would it need to know about metadata during detect though? Do you have something specific in mind?

I was thinking that if there are metadata/dependency incompatibilities that it would be better to fail during Detect (instead of Build) as that is a much faster feedback loop.

Imagine that I spent 10-20 minutes compiling a dependency, followed by 5-10 minutes for my application source code, only to discover that another dependency later on in the Build order was incompatible with the metadata.

loewenstein · 2023-04-27T12:28:47Z

text/0000-decouple-dependencies.md

+We propose a flag of `BP_EXTERNAL_METADATA_ENABLED` which defaults to `false`
+for use as buildpacks are being converted. In the default state, this flag
+tells a buildpack to use the metadata included with buildpack.toml. When set to
+`true`, a buildpack should use the new metadata. This can provide a way for
+users to test the new functionality without impacting existing users.


Could we make this more smoothly, by piggy backing on the BP_DEPENDENCY_METADATA environment variable, i.e. when it is set use external metadata if not continue with buildpack.toml based metadata?

This would work, except I think it makes for a poor UX because you have to opt-in to the new system by knowing where the files would be on the filesystem.

I think it's a better UX to ask the user to provide BP_EXTERNAL_METADATA_ENABLED than BP_DEPENDENCY_METADATA=/platform/deps/metadata

I guess my comment was made in the context of not defaulting the path, i.e. if used in a context with external metadata, the variable would be set without the user needing to do anything.

Yeah, I think this conversation is fairly-coupled to the other conversation above about who sets the environment variables - whether they are required during Detect or not.

text/0000-decouple-dependencies.md

robdimsdale · 2023-05-02T15:33:30Z

text/0000-decouple-dependencies.md

+
+The recommendation of this proposal is to announce the change in the release
+notes and on Slack, providing links to documentation of the new feature. The
+goal of this migration is that there is no loss of functionality for buildpack


At the language-family level, I agree this is a non-breaking change and so doesn't have to be a major version bump.

At the individual buildpack level, it is a breaking change though. Sometimes consumers rely on component buildpacks directly outside of the language family, and for these users the removal of a dependency is a breaking change and hence I think the component buildpacks that stop incorporating dependencies should have a major bump.

At the individual buildpack level, it is a breaking change though. Sometimes consumers rely on component buildpacks directly outside of the language family, and for these users the removal of a dependency is a breaking change and hence I think the component buildpacks that stop incorporating dependencies should have a major bump.

...but it doesn't remove dependencies, it removes dependency metadata, which is different. IMHO.

Dependencies are part of the public interface/contract with users, but I don't think that where and how we store metadata is part of that contract. So if what we do causes dependencies to be removed, then I 100% agree (I think this is even specified in another RFC) that it triggers a major version update, but I see the actual contents of buildpack.toml as internal to the buildpack and subject to change at our discretion.

I'm not advocating that we just remove it all, but I think the next paragraph strikes a reasonable balance on how teams can deal with that.

Also, this whole section is just "recommendations" and each language family can handle as they see fit.

I guess my point is that if a consumer previously relied on a dependency-providing buildpack (e.g. cpython), after we implement this RFC this consumer would also have to modify their infrastructure to pull in the location of the new metadata (new buildpack, builder, etc). That seems like a breaking change at the individual buildpack level, even if we can avoid making breaking changes at the language-family level.

I agree that this doesn't have to be spelled out in the RFC and we can just leave it to language family maintainers.

loewenstein

Alright, this is kind of a huge change suggestion. I would say the gist of it is that I feel we do not know enough to get into the level of detail yet and should concentrate on what explorations we need to change our confidence to define the right contract between dependency installing buildpacks and the various potential sources of dependencies.

Sorry if this comes like a rant, this is definitely not intended. I very much like the idea of this RFC and the possibilities it will unlock. I am certain some of the concrete proposals are already going into the right direction, but I would really think we need to take a step back, align the general idea and direction amongst the Paketo maintainers, discover the different use cases and problems we want to solve and only then define the concrete soolutions.

loewenstein · 2023-06-14T09:26:03Z

text/0000-decouple-dependencies.md

+This proposal suggests that we should add a new way for buildpacks to manage,
+package, and ship dependencies. At this point in time, we are not advocating
+for the deprecation or removal of the present way of managing dependencies,
+however, we hope over time that will be the natural evolution of things.


Suggested change

This proposal suggests that we should add a new way for buildpacks to manage,

package, and ship dependencies. At this point in time, we are not advocating

for the deprecation or removal of the present way of managing dependencies,

however, we hope over time that will be the natural evolution of things.

When it comes to binary dependencies, there is currently a strong coupling of the different aspects. This proposal suggests to decouple how buildpacks manage,

package, and ship dependencies from how they install and configure them.

Note that at this point in time, we are not advocating

for the deprecation or removal of the present way of managing dependencies,

however, we hope over time that will be the natural evolution of things.

loewenstein · 2023-06-14T09:47:01Z

text/0000-decouple-dependencies.md

+Presently, a Paketo buildpack that has binary dependencies will list metadata
+about these dependencies within its `buildpack.toml` file. This includes a URL
+from which the dependency can be downloaded, and also a checksum hash and other
+metadata like PURL and CPEs.
+
+Libraries like `libpak` and `packit` then provide convenience methods for
+fetching dependencies using this metadata, verifying the download, and caching
+the download result. In addition, they provide tooling to download and store
+these dependencies within buildpack images for distribution in offline
+environments.
+
+There are also tools published by the project to manage the entries within
+`buildpack.toml` through CI pipelines so that dependencies metadata is kept
+up-to-date with upstream sources. Unfortunately, this represents a large amount
+of toil for the buildpacks team.
+
+As an example of the toil mentioned in a language family like Java, there are
+daily project dependencies that need to be updated. This requires reviewing and
+merging PRs into the buildpacks to adjust `buildpack.toml` dependency metadata.
+Once PRs are merged, a component buildpack needs to be released, followed by a
+composite buildpack and then a builder release. This is because most users
+don’t consume buildpacks directly, they consume builders which include
+buildpacks.
+
+This all has to be done as aggressively as possible so that we are shipping
+dependencies, in particular those with security fixes, quickly. This is because
+with metadata in `buildpack.toml`, even if an upstream project releases a bug
+or security fix, buildpack users cannot get that fix until we update and
+release component and composite buildpacks as well as the builder.
+
+There is also toil associated with the tools and pipelines used for this
+process. The tools have bugs and need to be updated. At present, the tools we
+use to manage all of these updates do not scale well either. In particular
+Github Actions, we have had a number of issues hitting rate limits and usage
+caps. This gets worse when there are a lot of dependencies to watch, for
+example, if your buildpack has multiple version lines or different sets of
+packages for a dependency.
+
+The whole process puts an additional maintenance burden on the project
+maintainers and project resources. This is not the type of work that a casual
+contributor to Paketo will do and as we add more dependencies the burden only
+increases on the maintainer teams.
+
+The motivation of this proposal is to…
+
+- Reduce the burden and toil for Paketo buildpack maintainer teams
+- Continue publishing dependency updates in a timely and secure manner
+- Decouple installing dependencies from configuring them
+- Separate metadata and the actual dependencies so they can be provided to
+  buildpacks in a number of different and flexible ways
+- Establish a reasonable release schedule for buildpacks that’s based around
+  development, not dependencies and thus enabling buildpack vendors to support
+  version lines, although version line support is not planned for Paketo.
+- Make it easier to package buildpacks for offline environments.


Suggested change

Presently, a Paketo buildpack that has binary dependencies will list metadata

about these dependencies within its `buildpack.toml` file. This includes a URL

from which the dependency can be downloaded, and also a checksum hash and other

metadata like PURL and CPEs.

Libraries like `libpak` and `packit` then provide convenience methods for

fetching dependencies using this metadata, verifying the download, and caching

the download result. In addition, they provide tooling to download and store

these dependencies within buildpack images for distribution in offline

environments.

There are also tools published by the project to manage the entries within

`buildpack.toml` through CI pipelines so that dependencies metadata is kept

up-to-date with upstream sources. Unfortunately, this represents a large amount

of toil for the buildpacks team.

As an example of the toil mentioned in a language family like Java, there are

daily project dependencies that need to be updated. This requires reviewing and

merging PRs into the buildpacks to adjust `buildpack.toml` dependency metadata.

Once PRs are merged, a component buildpack needs to be released, followed by a

composite buildpack and then a builder release. This is because most users

don’t consume buildpacks directly, they consume builders which include

buildpacks.

This all has to be done as aggressively as possible so that we are shipping

dependencies, in particular those with security fixes, quickly. This is because

with metadata in `buildpack.toml`, even if an upstream project releases a bug

or security fix, buildpack users cannot get that fix until we update and

release component and composite buildpacks as well as the builder.

There is also toil associated with the tools and pipelines used for this

process. The tools have bugs and need to be updated. At present, the tools we

use to manage all of these updates do not scale well either. In particular

Github Actions, we have had a number of issues hitting rate limits and usage

caps. This gets worse when there are a lot of dependencies to watch, for

example, if your buildpack has multiple version lines or different sets of

packages for a dependency.

The whole process puts an additional maintenance burden on the project

maintainers and project resources. This is not the type of work that a casual

contributor to Paketo will do and as we add more dependencies the burden only

increases on the maintainer teams.

The motivation of this proposal is to…

- Reduce the burden and toil for Paketo buildpack maintainer teams

- Continue publishing dependency updates in a timely and secure manner

- Decouple installing dependencies from configuring them

- Separate metadata and the actual dependencies so they can be provided to

buildpacks in a number of different and flexible ways

- Establish a reasonable release schedule for buildpacks that’s based around

development, not dependencies and thus enabling buildpack vendors to support

version lines, although version line support is not planned for Paketo.

- Make it easier to package buildpacks for offline environments.

The key drawback of the current state with dependencies being packaged with the buildpacks is the rapid release cycle this enforces onto individual buildpacks. What makes matters worse is that the release of a buildpack needs to trigger a cascade of releases to include the buidpack in the language family composite buildpack and the builders Paketo offers.

The main motiviation for this proposal is hence that we can establish a reasonable release schedule for buildpacks while keeping the speed of delivering updates of dependencies.

This will significantly reduce the toil for both maintainers and infrastructure and additionally open new possibilities for buildpack providers, platform providers and user organizations to customize the delivery of dependencies.

- buildpack providers might want to provide offline capabilities (for airgapped environments)

- platform providers might offer mirrors for dependencies

- users might control the pace of adopting dependency updates

loewenstein · 2023-06-14T10:20:20Z

text/0000-decouple-dependencies.md

+At a high level:
+
+- We will define a metadata format that includes the metadata fields currently
+  present in `buildpack.toml`.
+- This metadata is provided at build-time in a known location on the build-time
+  filesystem, configurable via environment variables.
+- If present, this newly-defined metadata format supersedes the existing
+  dependency metadata in `buildpack.toml`.
+- We will add a dependency version validation section to `buildpack.toml`
+  metadata, this can be used to state that a buildpack version only supports
+  certain ranges of a given tool, such as Java `11.*` or Node.js `16.*`.
+- Metadata can be provided by anyone. A user can add custom metadata, or source
+  from a third party project. Paketo will provide an official set of metadata
+  against which we will test the Paketo buildpacks. It will be distributed via
+  images in an image registry (Docker Hub).
+- Buildpacks do not care how dependency metadata is distributed, that is a
+  separate concern, instead, they just read metadata from a specified location.
+  Meta-data could be provided by another buildpack, the builder, the platform
+  or even the user.
+- The actual dependencies are accessed via the metadata and that can happen
+  over any protocol (HTTPS/SFTP/FILE) and be distributed in any format
+  (archive/image).
+- Dependency metadata will **not** be removed from `buildpack.toml`, this will
+  be a matter of a separate RFC once all buildpacks have adopted this RFC.
+


Suggested change

At a high level:

- We will define a metadata format that includes the metadata fields currently

present in `buildpack.toml`.

- This metadata is provided at build-time in a known location on the build-time

filesystem, configurable via environment variables.

- If present, this newly-defined metadata format supersedes the existing

dependency metadata in `buildpack.toml`.

- We will add a dependency version validation section to `buildpack.toml`

metadata, this can be used to state that a buildpack version only supports

certain ranges of a given tool, such as Java `11.*` or Node.js `16.*`.

- Metadata can be provided by anyone. A user can add custom metadata, or source

from a third party project. Paketo will provide an official set of metadata

against which we will test the Paketo buildpacks. It will be distributed via

images in an image registry (Docker Hub).

- Buildpacks do not care how dependency metadata is distributed, that is a

separate concern, instead, they just read metadata from a specified location.

Meta-data could be provided by another buildpack, the builder, the platform

or even the user.

- The actual dependencies are accessed via the metadata and that can happen

over any protocol (HTTPS/SFTP/FILE) and be distributed in any format

(archive/image).

- Dependency metadata will **not** be removed from `buildpack.toml`, this will

be a matter of a separate RFC once all buildpacks have adopted this RFC.

At a high level:

- We will define `buildpack.toml` metadata that allows buildpacks to express dependencies by name and version

E.g. "We need a Java Virtual Machine in version 11.*" or "We need a Node.js runtime in version 16.*"

- We will define a metadata format that includes the metadata fields currently

present in `buildpack.toml` and that allows to specify dependencies by name and version.

- We will define a way to discover metadata at build time, that allows anyone to provide it

- We will adapt our tools and processes to provide dependency metadata according to the new formats

loewenstein · 2023-06-14T13:31:23Z

text/0000-decouple-dependencies.md

+## Metadata Format
+The metadata presented to a buildpack will be structured as a directory of
+dependencies.
+
+Like it does now, each dependency will have an id but unlike the present
+situation, the id is namespaced. An id is composed of an organization name and
+a dependency name. It follows the [reverse domain name notation](https://en.wikipedia.org/wiki/Reverse_domain_name_notation) and the
+dependency name is defined as the final item, so `com.example.dep-a` would have
+an organization of `com.example` and a dependency of `dep-a`. It is case
+insensitive, so `dep-a` is no different than `Dep-A`. This is to reduce the
+possibility of [typosquatting](https://en.wikipedia.org/wiki/Typosquatting).
+Allowed characters are [the same as for a valid hostname](https://en.wikipedia.org/wiki/Hostname#Syntax).


Suggested change

## Metadata Format

The metadata presented to a buildpack will be structured as a directory of

dependencies.

Like it does now, each dependency will have an id but unlike the present

situation, the id is namespaced. An id is composed of an organization name and

a dependency name. It follows the [reverse domain name notation](https://en.wikipedia.org/wiki/Reverse_domain_name_notation) and the

dependency name is defined as the final item, so `com.example.dep-a` would have

an organization of `com.example` and a dependency of `dep-a`. It is case

insensitive, so `dep-a` is no different than `Dep-A`. This is to reduce the

possibility of [typosquatting](https://en.wikipedia.org/wiki/Typosquatting).

Allowed characters are [the same as for a valid hostname](https://en.wikipedia.org/wiki/Hostname#Syntax).

How this will be done should be based on the outcome of explorations and will need further RFCs to pin down once we know more.

As dependencies escape the local scope of individual buildpacks, we will need to make sure to disambiguate dependency names. One possibility that has precendent in our industry is the use of [reverse domain name notation](https://en.wikipedia.org/wiki/Reverse_domain_name_notation) for namespacing and restricting the character set to [valid hostname](https://en.wikipedia.org/wiki/Hostname#Syntax) to prevent [typosquatting](https://en.wikipedia.org/wiki/Typosquatting).

There is a similar precendent in the industry for version matching following [semver](https://semver.org/). However, we know already that some dependencies do not follow semver, so a potential fallback to regex or some kind of free form version matching is very likely.

## Metadata Format

The metadata presented to a buildpack will be structured as a directory of

dependencies.

loewenstein · 2023-06-14T13:40:24Z

text/0000-decouple-dependencies.md

+## Metadata Format
+The metadata presented to a buildpack will be structured as a directory of
+dependencies.
+
+Like it does now, each dependency will have an id but unlike the present
+situation, the id is namespaced. An id is composed of an organization name and
+a dependency name. It follows the [reverse domain name notation](https://en.wikipedia.org/wiki/Reverse_domain_name_notation) and the
+dependency name is defined as the final item, so `com.example.dep-a` would have
+an organization of `com.example` and a dependency of `dep-a`. It is case
+insensitive, so `dep-a` is no different than `Dep-A`. This is to reduce the
+possibility of [typosquatting](https://en.wikipedia.org/wiki/Typosquatting).
+Allowed characters are [the same as for a valid hostname](https://en.wikipedia.org/wiki/Hostname#Syntax).
+
+The directory structure will contain a folder for each dot-separated segment of
+the dependency’s organization name and in the lowest level directory there will
+be a file named after the dependency name with the extension `toml`, this is
+because the metadata file will be TOML format. For example, with
+`com.example.dep-a`, there would be the following folder structure:
+
+```
+com
+└── example
+    └── dep-a.toml
+```
+
+As mentioned previously, each individual metadata file has a file name
+consisting of the dependency name with an extension of `toml`. The internal
+format consists of a single table called `versions` which is an array of tables
+containing all of the versions for that particular dependency. Each version
+entry requires a `uri`, `version`, `checksum`, `arch`, `os`, and `license`. It
+may also have `name`, `purl`, `strip-components`, and `cpes` although these are
+optional. The `cpes` entry is an array of strings identifying all of the CPEs
+for that dependency. The `license` is itself an array of tables containing
+`type` and `uri` of the license for the dependency.
+
+For example:
+```toml
+[[versions]]
+cpes = [ "cpe:2.3:a:apache:maven:3.8.6:*:*:*:*:*:*:*" ]
+name = "Apache Maven"
+purl = "pkg:generic/[email protected]"
+checksum = "sha256:c7047a48deb626abf26f71ab3643d296db9b1e67f1faa7d988637deac876b5a9"
+arch = "x86_64"
+os = "linux"
+distro = "ubuntu-18.04"
+uri = "https://repo1.maven.org/maven2/org/apache/maven/apache-maven/3.8.6/apache-maven-3.8.6-bin.tar.gz"
+version = "3.8.6"
+strip-components = 1
+
+  [[versions.licenses]]
+  type = "Apache-2.0"
+  uri = "https://www.apache.org/licenses/"
+
+[[versions]]
+cpes = [ "cpe:2.3:a:apache:mvnd:0.7.1:*:*:*:*:*:*:*" ]
+name = "Apache Maven Daemon"
+purl = "pkg:generic/[email protected]"
+chekcsum = "sha256:ac0b276d4d7472d042ddaf3ad46170e5fcb9350981af91af6c5c13e602a07393"
+arch = "x86_64"
+os = "linux"
+uri = "https://github.com/apache/maven-mvnd/releases/download/0.7.1/mvnd-0.7.1-linux-amd64.zip"
+version = "0.7.1"
+
+  [[versions.licenses]]
+  type = "Apache-2.0"
+  uri = "https://www.apache.org/licenses/"
+```


I don't think that we should go that far into the details here. We need unambiguous dependencies and with potentially many providers (at least Paketo providing batteries included Builders, but users taking control of specific dependencies themselves) unambiguous dependency providers.

Any details of this would imho need exploration and should hence not be defined in this initial RFC.

Suggested change

## Metadata Format

The metadata presented to a buildpack will be structured as a directory of

dependencies.

Like it does now, each dependency will have an id but unlike the present

situation, the id is namespaced. An id is composed of an organization name and

a dependency name. It follows the [reverse domain name notation](https://en.wikipedia.org/wiki/Reverse_domain_name_notation) and the

dependency name is defined as the final item, so `com.example.dep-a` would have

an organization of `com.example` and a dependency of `dep-a`. It is case

insensitive, so `dep-a` is no different than `Dep-A`. This is to reduce the

possibility of [typosquatting](https://en.wikipedia.org/wiki/Typosquatting).

Allowed characters are [the same as for a valid hostname](https://en.wikipedia.org/wiki/Hostname#Syntax).

The directory structure will contain a folder for each dot-separated segment of

the dependency’s organization name and in the lowest level directory there will

be a file named after the dependency name with the extension `toml`, this is

because the metadata file will be TOML format. For example, with

`com.example.dep-a`, there would be the following folder structure:

```

com

└── example

└── dep-a.toml

```

As mentioned previously, each individual metadata file has a file name

consisting of the dependency name with an extension of `toml`. The internal

format consists of a single table called `versions` which is an array of tables

containing all of the versions for that particular dependency. Each version

entry requires a `uri`, `version`, `checksum`, `arch`, `os`, and `license`. It

may also have `name`, `purl`, `strip-components`, and `cpes` although these are

optional. The `cpes` entry is an array of strings identifying all of the CPEs

for that dependency. The `license` is itself an array of tables containing

`type` and `uri` of the license for the dependency.

For example:

```toml

[[versions]]

cpes = [ "cpe:2.3:a:apache:maven:3.8.6:*:*:*:*:*:*:*" ]

name = "Apache Maven"

purl = "pkg:generic/[email protected]"

checksum = "sha256:c7047a48deb626abf26f71ab3643d296db9b1e67f1faa7d988637deac876b5a9"

arch = "x86_64"

os = "linux"

distro = "ubuntu-18.04"

uri = "https://repo1.maven.org/maven2/org/apache/maven/apache-maven/3.8.6/apache-maven-3.8.6-bin.tar.gz"

version = "3.8.6"

strip-components = 1

[[versions.licenses]]

type = "Apache-2.0"

uri = "https://www.apache.org/licenses/"

[[versions]]

cpes = [ "cpe:2.3:a:apache:mvnd:0.7.1:*:*:*:*:*:*:*" ]

name = "Apache Maven Daemon"

purl = "pkg:generic/[email protected]"

chekcsum = "sha256:ac0b276d4d7472d042ddaf3ad46170e5fcb9350981af91af6c5c13e602a07393"

arch = "x86_64"

os = "linux"

uri = "https://github.com/apache/maven-mvnd/releases/download/0.7.1/mvnd-0.7.1-linux-amd64.zip"

version = "0.7.1"

[[versions.licenses]]

type = "Apache-2.0"

uri = "https://www.apache.org/licenses/"

```

I think that I am ok with some push back on the format however for the most part this mirrors our existing format that exists in the buildpack.toml as it stands today so I am curious what the starting point of an investigation for a format would look like and what you see as the problem with this format? I think that we need some base standard to be defined in order for this to be used externally so I am curious what your thoughts are.

for the most part this mirrors our existing format that exists in the buildpack.toml as it stands today so I am curious what the starting point of an investigation for a format would look like

I guess I'va just taken the existing format for granted. I.e. we sure need most - if not all - of the properties currently maintained in the buildpack.toml and just move them somewhere else. What I am sceptical about is the concrete format proposed, i.e. one file per dependency and a reverse domain name folder structure.

I could see use cases that would benefit from being able to ship a few dependencies' metadata in a single file for example. Like, whenever the end user specify something - enforcing structure and separation could hinder ease of use.

loewenstein · 2023-06-14T13:42:21Z

text/0000-decouple-dependencies.md

+## Dependency Validation
+In `buildpack.toml` we will add a section to metadata for specifying dependency
+validation parameters. This is a way that buildpacks can state that they do or
+do not support certain versions of dependencies.
+
+A buildpack does not need to include this section, it is optional. If included,
+the buildpack and libraries like `libpak` and `packit` may use the information
+to fail if dependency versions are requested by a user that might cause
+problems for the buildpack. It is the buildpacks responsibility to process the
+validations and react to them, whether that be warning the user or even
+failing.
+
+The format for this metadata is such that you have an array of tables called
+`validations`. Each table in the array contains the dependency id, which
+follows the format of a dependency id as outlined in the [Metadata
+Format](#metadata-format) section. In addition, it includes an array of strings
+called `supported` which contains a list of [semver](https://semver.org/)
+ranges that indicate what is supported by that buildpack. Optionally it can
+contain a `type` which defaults to `semver`, but can be set to `regex`.
+
+It is recommended that buildpack authors use semver as the type because
+matching is generally simpler that way, however, if a dependency does not
+follow semver you may use regular expressions to match the versions that are
+compatible with the buildpack.
+
+If any semver range or regular expression matches then it can be assumed that
+the buildpack is compatible. If no range matches then it can be assumed that
+the version is not compatible.
+
+For example:
+```toml
+[[metadata.validations]]
+dependency-id = "com.example.jre"
+supported = [ "8.0.*", "11.0.*", "17.0.*" ]
+
+[[metadata.validations]]
+dependency-id = "com.example.nodejs"
+supported = [ "^16.0", "^17.0", "^18.0" ]
+
+[[metadata.validations]]
+dependency-id = "com.example.tomcat"
+supported = [ "8\.5\.\d+", "9\.0\.\d+", "10\.0\.\d+" ]
+type = "regex"
+```
+
+A buildpack is encouraged to be as permissive as possible. This ensures that a
+buildpack author won’t have to frequently update this metadata. This should be
+balanced by buildpack authors to provide compatibility guarantees with the
+tools required to run the buildpack and for the software it is installing to
+run. Generally, we believe that most buildpacks will be compatible with the
+major versions of software they presently support and that new major versions
+of dependencies should be tested and validations should be expanded after
+testing.


I don't think that we should go that far into the details here. We will need a way for buildpacks to unambigously require depndencies and assert some control over details like versions.

Any details of this would imho need exploration and should hence not be defined in this initial RFC.

Suggested change

## Dependency Validation

In `buildpack.toml` we will add a section to metadata for specifying dependency

validation parameters. This is a way that buildpacks can state that they do or

do not support certain versions of dependencies.

A buildpack does not need to include this section, it is optional. If included,

the buildpack and libraries like `libpak` and `packit` may use the information

to fail if dependency versions are requested by a user that might cause

problems for the buildpack. It is the buildpacks responsibility to process the

validations and react to them, whether that be warning the user or even

failing.

The format for this metadata is such that you have an array of tables called

`validations`. Each table in the array contains the dependency id, which

follows the format of a dependency id as outlined in the [Metadata

Format](#metadata-format) section. In addition, it includes an array of strings

called `supported` which contains a list of [semver](https://semver.org/)

ranges that indicate what is supported by that buildpack. Optionally it can

contain a `type` which defaults to `semver`, but can be set to `regex`.

It is recommended that buildpack authors use semver as the type because

matching is generally simpler that way, however, if a dependency does not

follow semver you may use regular expressions to match the versions that are

compatible with the buildpack.

If any semver range or regular expression matches then it can be assumed that

the buildpack is compatible. If no range matches then it can be assumed that

the version is not compatible.

For example:

```toml

[[metadata.validations]]

dependency-id = "com.example.jre"

supported = [ "8.0.*", "11.0.*", "17.0.*" ]

[[metadata.validations]]

dependency-id = "com.example.nodejs"

supported = [ "^16.0", "^17.0", "^18.0" ]

[[metadata.validations]]

dependency-id = "com.example.tomcat"

supported = [ "8\.5\.\d+", "9\.0\.\d+", "10\.0\.\d+" ]

type = "regex"

```

A buildpack is encouraged to be as permissive as possible. This ensures that a

buildpack author won’t have to frequently update this metadata. This should be

balanced by buildpack authors to provide compatibility guarantees with the

tools required to run the buildpack and for the software it is installing to

run. Generally, we believe that most buildpacks will be compatible with the

major versions of software they presently support and that new major versions

of dependencies should be tested and validations should be expanded after

testing.

I am fine with removing this from the overall RFC and implementing this as an ad hoc structure in Paketo on codifying it later if we see fit. I would be curious if anyone felt this had to be boiler plate for the RFC as it stands today?

loewenstein · 2023-06-14T13:46:34Z

text/0000-decouple-dependencies.md

+## Metadata Distribution
+There is no prescribed method for distributing metadata. It could be done in a
+variety of ways, including HTTPS/SFTP distributed archives, `rsync` of remote
+directories, or even distributed as an image through an image registry.
+
+For the Paketo project, this proposal suggests distributing metadata through
+images in an image registry. This allows the project to use the existing image
+registry to distribute the metadata. Image registries also have inherent
+properties that help with security, like an image cannot be modified without
+creating a new hash for the image and images can be signed (signing is out of
+scope for this proposal). In addition, images are easily versioned such that
+users can hold back updates to dependencies if desired and are easily cached.
+
+The contents of the image will contain the directory structure defined in
+[Metadata Format](#metadata-format). There should not be a top-level directory
+added, so the root of the image should contain all of the directories created
+as top-level organization names, like `com` or `org`. All of the metadata is to
+be included in a single layer. Updates to metadata will require downloading the
+entire layer again, however, it is a single layer and the size is expected to
+be small so this should be very fast.
+
+Further, this proposal suggests subdividing metadata images by project
+sub-team. Each sub-team will be given a unique reverse domain name like
+`io.paketo.java` and `io.paketo.utilities`. In this way, the project’s metadata
+can be easily combined without having conflicts.
+
+This allows users to pick and choose the metadata that’s relevant to their
+needs.


I don't think that we should go that far into the details here. We will need to define how we plan to distribute metadata.
We might want to make a point that the language family maintainers should keep the responsibility and freedom to decide on dependency updates within their language family.

Any details of this would imho need exploration and should hence not be defined in this initial RFC.

Suggested change

## Metadata Distribution

There is no prescribed method for distributing metadata. It could be done in a

variety of ways, including HTTPS/SFTP distributed archives, `rsync` of remote

directories, or even distributed as an image through an image registry.

For the Paketo project, this proposal suggests distributing metadata through

images in an image registry. This allows the project to use the existing image

registry to distribute the metadata. Image registries also have inherent

properties that help with security, like an image cannot be modified without

creating a new hash for the image and images can be signed (signing is out of

scope for this proposal). In addition, images are easily versioned such that

users can hold back updates to dependencies if desired and are easily cached.

The contents of the image will contain the directory structure defined in

[Metadata Format](#metadata-format). There should not be a top-level directory

added, so the root of the image should contain all of the directories created

as top-level organization names, like `com` or `org`. All of the metadata is to

be included in a single layer. Updates to metadata will require downloading the

entire layer again, however, it is a single layer and the size is expected to

be small so this should be very fast.

Further, this proposal suggests subdividing metadata images by project

sub-team. Each sub-team will be given a unique reverse domain name like

`io.paketo.java` and `io.paketo.utilities`. In this way, the project’s metadata

can be easily combined without having conflicts.

This allows users to pick and choose the metadata that’s relevant to their

needs.

I agree with this sentiment. As long as you are able to conform to the larger API then I think that users should be able to deliver the metadata as they see fit. We might want to either record or codify our methodology at some point in the future. Is there anyone that feels a distribution method should be chosen as part of this RFC?

loewenstein · 2023-06-14T13:51:25Z

text/0000-decouple-dependencies.md

+## Buildpack Dependency Metadata Interface
+The interface between dependency metadata and a buildpack is simple. A single
+directory of metadata will be presented to the buildpack. It will be presented
+at the location specified by `BP_DEPENDENCY_METADATA`, which defaults to
+`/platform/deps/metadata` (the intent is to pilot and try out this or possibly
+other locations, eventually proposing an RFC with Cloud-Native buildpacks to
+standardize the location).
+
+Buildpacks do not care if there are multiple sources of metadata information,
+however, this needs to be merged and presented to the buildpack as one single
+directory. How that information is merged is outside of the scope of this
+document, but the directory structure defined in [Metadata
+Format](#metadata-format) guarantees that there will not be any duplicate
+dependency ids.
+
+This document does not specify how the dependency metadata folder should be
+provided to buildpacks but here are some possibilities:
+
+1. External. The metadata information can be managed outside of the buildpack
+   lifecycle. This allows for users to manually pull in the metadata they would
+   like when they would like it. The metadata can then be mapped into the build
+   container at the specified location using a volume mount to `pack build`.
+1. Via buildpack. We could add a Paketo buildpack to the beginning of buildpack
+   order groups that is responsible for pulling metadata and overrides
+   `BP_DEPENDENCY_METADATA` to point to its layer. The buildpack can check for
+   dependency metadata updates when it runs.
+1. Via builder. The builder could come with metadata included under
+   `/platform/deps/metadata`. This would likely require a builder with many
+   buildpacks to update very frequently though, so may not be the best idea for
+   the average case, however, it could be very useful as a way to distribute
+   dependencies in offline environments.
+1. Via the platform. Platforms may choose to offer enhanced functionality to
+   more easily distribute dependencies. In the end, the platform just needs to
+   ensure that the dependency metadata is available to the buildpack in the
+   required location.
+
+In addition to the flexibility of where the dependencies originate, this
+proposal also provides flexibility in how those dependencies are managed such
+as floating them so the latest versions are always available, pinning to a
+specific set of dependencies or even pinning and including the dependencies
+with the metadata.


I don't think that we should go that far into the details here. We will need to define how dependency metadata gets injected to buildpacks and how any kind of disambiguation or precendence gets handled. I inclined to think that it is too early to guarantee that the buildpack can rely an the resolution haven been taken care of though, maybe packit and libpak are nice candidates to do the work.

Any details of this would imho need exploration and should hence not be defined in this initial RFC.

Suggested change

## Buildpack Dependency Metadata Interface

The interface between dependency metadata and a buildpack is simple. A single

directory of metadata will be presented to the buildpack. It will be presented

at the location specified by `BP_DEPENDENCY_METADATA`, which defaults to

`/platform/deps/metadata` (the intent is to pilot and try out this or possibly

other locations, eventually proposing an RFC with Cloud-Native buildpacks to

standardize the location).

Buildpacks do not care if there are multiple sources of metadata information,

however, this needs to be merged and presented to the buildpack as one single

directory. How that information is merged is outside of the scope of this

document, but the directory structure defined in [Metadata

Format](#metadata-format) guarantees that there will not be any duplicate

dependency ids.

This document does not specify how the dependency metadata folder should be

provided to buildpacks but here are some possibilities:

1. External. The metadata information can be managed outside of the buildpack

lifecycle. This allows for users to manually pull in the metadata they would

like when they would like it. The metadata can then be mapped into the build

container at the specified location using a volume mount to `pack build`.

1. Via buildpack. We could add a Paketo buildpack to the beginning of buildpack

order groups that is responsible for pulling metadata and overrides

`BP_DEPENDENCY_METADATA` to point to its layer. The buildpack can check for

dependency metadata updates when it runs.

1. Via builder. The builder could come with metadata included under

`/platform/deps/metadata`. This would likely require a builder with many

buildpacks to update very frequently though, so may not be the best idea for

the average case, however, it could be very useful as a way to distribute

dependencies in offline environments.

1. Via the platform. Platforms may choose to offer enhanced functionality to

more easily distribute dependencies. In the end, the platform just needs to

ensure that the dependency metadata is available to the buildpack in the

required location.

In addition to the flexibility of where the dependencies originate, this

proposal also provides flexibility in how those dependencies are managed such

as floating them so the latest versions are always available, pinning to a

specific set of dependencies or even pinning and including the dependencies

with the metadata.

I think at a bare minimum in this RFC I would like to define some interface that would allow for the development of the decouple dependency metadata. I am not sure what exploration we would launch that would not be a competing interface to the one proposed. If you are looking for a competing interface or have one in mind then I think that we should have competing RFCs. I think that it would be fine to remove the potential delivery methods from this section but I think that the overall idea should remain.

You might be right with the competing interface. I feel like it would make sense to go for BP_DEPENDENCIES and /platform/deps to allow dependency metadata and assets to be distributed together (if assets get delivered together with metadata, i.e. offline/air gapped case).

loewenstein · 2023-06-14T13:55:02Z

text/0000-decouple-dependencies.md

+## Dependency Assets
+This proposal does not impact how dependencies are being distributed and it
+supports the current methods referring to remote locations, like upstream
+projects, or using the Deps Server. The dependency metadata just needs to point
+to the location from which the dependency should be fetched as is currently
+being done with the buildpack.toml dependency metadata.
+
+For builders, buildpacks, or platforms that would like to inject dependency
+assets directly into the build container, perhaps to support offline builds, we
+propose defining `BP_DEPENDENCY_ASSET_ROOT` which defaults to
+`/platform/deps/assets` as the location where the actual dependency assets
+should be located (again, the intent is to pilot and try out this or possibly
+other locations, eventually proposing an RFC with Cloud-Native buildpacks to
+standardize the location).
+
+In this way, one could have dependency metadata that uses `file://` URLs to
+refer to a stable location, i.e. `/platform/deps/assets/…`. These dependencies
+could then be accessed by the buildpack directly, and the buildpack doesn’t
+need to care how they were added to the container.
+
+This document does not specify how the dependency asset folder should be
+provided to buildpacks but here are some possibilities:
+
+1. External. The dependency assets can be managed outside of the buildpack
+   lifecycle. This allows for users to manually pull in the dependency assets
+   they would like when they would like them. It also outlives the buildpack
+   lifecycle so assets could be shared across builds for greater caching
+   benefit. The assets can then be mapped into the build container at the
+   specified location using a volume mount to `pack build`.
+1. Via builder. The builder could come with assets included under
+   `/platform/deps/assets`. This would likely require a builder with many
+   buildpacks to update very frequently though, so may not be the best idea for
+   the average case, however, it could be very useful as a way to distribute
+   dependencies in offline environments.
+1. Via the platform. Platforms may choose to offer enhanced functionality to
+   more easily distribute dependencies. In the end, the platform just needs to
+   ensure that the dependency metadata is available to the buildpack in the
+   required location.


I am inclined to think that dependency asset - if they are included - they should be easily allowed to be provided together with the metadata. More broadly speaking though I'd defer any definitions or decisions to after an exploration of e.g. dependency proxies or offline dependencies.

Suggested change

## Dependency Assets

This proposal does not impact how dependencies are being distributed and it

supports the current methods referring to remote locations, like upstream

projects, or using the Deps Server. The dependency metadata just needs to point

to the location from which the dependency should be fetched as is currently

being done with the buildpack.toml dependency metadata.

For builders, buildpacks, or platforms that would like to inject dependency

assets directly into the build container, perhaps to support offline builds, we

propose defining `BP_DEPENDENCY_ASSET_ROOT` which defaults to

`/platform/deps/assets` as the location where the actual dependency assets

should be located (again, the intent is to pilot and try out this or possibly

other locations, eventually proposing an RFC with Cloud-Native buildpacks to

standardize the location).

In this way, one could have dependency metadata that uses `file://` URLs to

refer to a stable location, i.e. `/platform/deps/assets/…`. These dependencies

could then be accessed by the buildpack directly, and the buildpack doesn’t

need to care how they were added to the container.

This document does not specify how the dependency asset folder should be

provided to buildpacks but here are some possibilities:

1. External. The dependency assets can be managed outside of the buildpack

lifecycle. This allows for users to manually pull in the dependency assets

they would like when they would like them. It also outlives the buildpack

lifecycle so assets could be shared across builds for greater caching

benefit. The assets can then be mapped into the build container at the

specified location using a volume mount to `pack build`.

1. Via builder. The builder could come with assets included under

`/platform/deps/assets`. This would likely require a builder with many

buildpacks to update very frequently though, so may not be the best idea for

the average case, however, it could be very useful as a way to distribute

dependencies in offline environments.

1. Via the platform. Platforms may choose to offer enhanced functionality to

more easily distribute dependencies. In the end, the platform just needs to

ensure that the dependency metadata is available to the buildpack in the

required location.

## Dependency Assets

This proposal does not impact how dependencies are being distributed and it

supports the current methods referring to remote locations, like upstream

projects, or using the Deps Server. The dependency metadata just needs to point

to the location from which the dependency should be fetched as is currently

being done with the buildpack.toml dependency metadata.

We foresee that for example builders, buildpacks, or platforms would like to inject dependency

assets directly into the build container, perhaps to support offline builds.

We expect this to be possible with dependency decoupling in place, but this is out-of-scope for this RFC.

loewenstein · 2023-06-14T14:32:41Z

text/0000-decouple-dependencies.md

+   ensure that the dependency metadata is available to the buildpack in the
+   required location.
+
+## Implementation


We should actually start with a couple of explorations of different use cases and clarify those details they I suggest we leave out of here.
Maybe we should have a list of explorations right here and invite all Paketo maintainers to have a look at there dependency handling and add explorations for their use cases.

Suggested change

## Implementation

## Implementation

### Exploration

We propose to start with a larger exploration phase in order to detail out the contracts between dependency installing buildpacks and dependency providers.

E.g.

- Could we have a single JVM buildpack and defer vendor variants to externally provided dependency metadata for e.g. Bellsoft, SapMachine, Microsoft, etc.?

- How can we keep multiple version lines of dependencies (Java 8, 11, 17, 20 and similar for Node)

- What dependency versioning schemes exist and is there a common pattern besides semver (e.g. regex) that fits all other cases?

- How do we handle precedence if a builder comes with batteries included, the platform adds dependencies and the user tries to override some of the provided ones?

- ...

loewenstein · 2023-07-25T15:23:18Z

I wasn't able to grasp the current PR with change suggestions any longer and maybe you are right @ForestEckhardt and some of my comments go beyond review and towards an alternative proposal. So I started sketching that in https://github.com/paketo-buildpacks/rfcs/blob/decouple-dependencies-alternative/text/0000-decouple-dependencies.md based on the branch used in here.

Happy to hear what you think of it.

cc @dmikusa

dmikusa · 2023-07-25T16:21:09Z

The proposal is intentionally abstract. How metadata and actual binaries become available to the buildpacks running in the container is meant to be open-ended so that we can support multiple platforms and use cases. I fully realize that makes it harder to follow.

I agree with @ForestEckhardt in that what we really need to specify is the metadata format, both the directory/file structure as well as the structure inside of the metadata files. The nebulous processes that run outside of the container to provide the metadata need to meet and coordinate with the buildpack code through this interface.

I'm simplifying a lot here, but I think most other things can be worked out later or through follow-up RFCs. Things like where the binaries live. That's really a concern between a particular buildpack and a particular metadata provider (or possibly between language family teams). If patterns eventually emerge, we can turn them into RFCs so we have consistent behavior.

For example. We could consider pack and possibly some manual work (or a tool to be defined later) to be the external metadata provider. It has created the prescribed directory structure and metadata files. It then volume mounts that into the container at the prescribed location. The buildpack knows where to find those files because of this RFC and how to read them.

I think what happens next is out of scope for this RFC, because otherwise, it's going to make this RFC really, really big and hard to implement and it's already pretty large and complicated.

There are a number of questions though, carrying on from the example, what happens next? Presumably, the buildpack will install some dependencies, but what dependencies does it install? What version/how does it select a version? What dependency formats does it support (tar/rpm/deb/etc..)? What transports does it support? etc...

I'd like to leave this type of stuff out of this RFC, so that we can get the interface together first. Then we can experiment with things and see what ends up being useful and common across Paketo buildpacks. For example, maybe down the road we decide that we want a generic Paketo buildpack that can install any requested dependency by an id or name from local file systems/HTTP and understands tar archives, and uses semver. To me, that's hard to know right now and is big enough to make its own RFC.

loewenstein · 2023-07-25T16:45:37Z

We do not have any nebulous tool creating dependency metadata yet though but will rather develop them as part of this endeavor, don't we?

Hence, I believe this RFC is indeed too specific. I would prefer to first agree on "yes, we see the potential benefits of the decoupling and should pursue this" and then actually figure out the details while workings on it.

I'd start with a minimal set, one buildpack per buildpack library in use in the Paketo project and probably a human as the first nebulous tool to provide metadata and assets to the buildpack.

dmikusa · 2023-07-25T17:04:50Z

We do not have any nebulous tool creating dependency metadata yet though but will rather develop them as part of this endeavor, don't we?

Nothing yet. From the Paketo standpoint, I'd expect to have tooling in this area for CI, similar to jam and other tools we have now that manage dependencies.

For other use cases, I don't really expect Pakto to provide those. Maybe we look at providing a tool for local dev, since that's a common use case? Maybe we let projects like Spring that implement buildpacks integration provide that?

Hence, I believe this RFC is indeed too specific. I would prefer to first agree on "yes, we see the potential benefits of the decoupling and should pursue this" and then actually figure out the details while workings on it.

I definitely don't want this to be overly specific, but at the same time, I don't think we need an RFC just to agree on pursuing this effort. I'd like to think that by now someone would have voiced a counter-opinion if it is not a thing we should pursue. So I'm hoping we can have an RFC that gives us some minimal foundation on which we can start building things.

I'd start with a minimal set, one buildpack per buildpack library in use in the Paketo project and probably a human as the first nebulous tool to provide metadata and assets to the buildpack.

Not sure I follow you here. Sorry.

loewenstein · 2023-07-25T17:35:29Z

Alright, I do believe that the lifecycle of metadata and assets, if assets are to be provided, should be coupled not separated.

I do believe that dependency bundles (the mix of metadata and potentially assets of some set of dependencies) should scale from small to large. I.e. a user should easily be able to provide a single dependency as well as the language family maintainers should be able to provide larger ones. The former would benefit from a single metadata file, while the latter could be achieved with e.g. allowing a file system structure to easily add and group namespaces.

If the same dependencies come from different dependency bundles, we need a way to figure out precedence.

I would however like to avoid putting these rough thoughts into formal specification just yet, but rather align on goals to reach.

loewenstein · 2023-07-25T17:57:34Z

I'd start with a minimal set, one buildpack per buildpack library in use in the Paketo project and probably a human as the first nebulous tool to provide metadata and assets to the buildpack.

Not sure I follow you here. Sorry.

What I meant was that we have all things under control, so there is little need to specify details upfront instead of just diving into it and discover the specifics we need as part of the journey.

dmikusa · 2023-07-25T19:58:11Z

@loewenstein @ForestEckhardt - Let's move this discussion to Slack, and get a little more consensus. I think we're expecting too much out of GH Issues here to keep everything in order :) I've invited you both to a channel. It's public, so others interested are welcome, I just didn't want to spam everyone with invites so I only started with us.

ForestEckhardt · 2023-10-17T18:31:47Z

This RFC has been drafted to indicate that a component of it if being worked on in another RFC and that this is not my main focus. This RFC will be undrafted when it is the main focus again.

braunsonm · 2024-07-04T21:08:53Z

text/0000-decouple-dependencies.md

+- Metadata can be provided by anyone. A user can add custom metadata, or source
+  from a third party project. Paketo will provide an official set of metadata
+  against which we will test the Paketo buildpacks. It will be distributed via
+  images in an image registry (Docker Hub).


It is crucial that Paketo continues to offer tooling to allow offline buildpack packaging as it does right now but with this official set of metadata instead.

RFC: Decouple Dependencies from the Buildpacks

6a3ce13

ForestEckhardt requested a review from a team as a code owner April 25, 2023 19:48

robdimsdale reviewed Apr 26, 2023

View reviewed changes

text/0000-decouple-dependencies.md Outdated Show resolved Hide resolved

robdimsdale reviewed Apr 26, 2023

View reviewed changes

text/0000-decouple-dependencies.md Show resolved Hide resolved