Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Rely on the content of a package when installing it #115032

Closed
2 tasks
jsoriano opened this issue Oct 14, 2021 · 12 comments · Fixed by #142353
Closed
2 tasks

[Fleet] Rely on the content of a package when installing it #115032

jsoriano opened this issue Oct 14, 2021 · 12 comments · Fixed by #142353
Assignees
Labels
8.6 candidate Feature:EPM Fleet team's Elastic Package Manager (aka Integrations) project Team:Fleet Team label for Observability Data Collection Fleet team technical debt Improvement of the software architecture and operational architecture

Comments

@jsoriano
Copy link
Member

jsoriano commented Oct 14, 2021

When installing a package, Kibana/Fleet currently uses information from the registry API that is also available in the package itself.

For example in elastic/package-registry#750 it is discussed to add a field to the registry API that is also included in the package manifest elastic/package-spec#226. This information is only used during the installation of the package, it doesn't provide any other value in the registry API. During the installation of the package Kibana should/could already have access to the package itself, and to the information it contains.

Current approach is problematic because it requires some duplicity. We are trying to make the package-spec the single source of truth for packages, but this approach requires to keep duplicated information in the registry. It creates an unnecessary coupling between the spec, kibana and the registry.

This issue may be also a blocker for other related efforts, as to cryptographically validate the content of a package before attempting to install it (elastic/package-registry#728), or to install packages without requiring a registry (#70582).

It shouldn't be expected that the registry API provides fine-grained access to all the contents of a package.

Implementation

In #126915, logic was added to produce this information for uploaded and bundled packages. We need to switch over to this logic for packages fetched from the registry as well.

@jsoriano jsoriano added the Team:Fleet Team label for Observability Data Collection Fleet team label Oct 14, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@jlind23
Copy link
Contributor

jlind23 commented Nov 9, 2021

@joshdover @jen-huang could you please take a look at this as it could be a blocker soon regarding package signature. elastic/package-registry#726 if needed we can jump on a call.

@jen-huang jen-huang added the technical debt Improvement of the software architecture and operational architecture label Nov 16, 2021
@jen-huang
Copy link
Contributor

@joshdover IIRC, Fleet already downloads the entire package from the registry and stores all assets (including manifest files) in ES. The next time package information needs to be retrieved, I think we even build it from the files in ES storage:

export const getEsPackage = async (

Could you dive a bit further and confirm this? From my assumptions, I think this will take very little effort (if any).

@joshdover
Copy link
Contributor

ack: will do tomorrow

@joshdover
Copy link
Contributor

joshdover commented Nov 30, 2021

When the package is already installed, we build the RegistryPackage type from the manifest file asset Saved Object and then enhance it with other info from the package files to build the RegistryPackage type:

// create the packageInfo
// TODO: this is mostly copied from validtion.ts, needed in case package does not exist in storage yet or is missing from cache
// we don't want to reach out to the registry again so recreate it here. should check whether it exists in packageInfoCache first
const manifestPath = `${pkgName}-${pkgVersion}/manifest.yml`;
const soResManifest = await savedObjectsClient.get<PackageAsset>(
ASSETS_SAVED_OBJECT_TYPE,
assetPathToObjectId(manifestPath)
);
const packageInfo = safeLoad(soResManifest.attributes.data_utf8);

When pulling from the registry due to the package not being installed, we load the entire package into the cache and then build the RegistryPackage type from the registry API here:

const res = await fetchUrl(`${registryUrl}/package/${pkgName}/${pkgVersion}`).then(JSON.parse);

Ironically, in both cases the manifest entries appear to already be in the archive cache so I think we're making unnecessary network calls in both cases. The easiest change would probably be to remove the API call in getInfo for the cache-miss case (Registry.getRegistryPackage) and build it from the manifest and other files the same way we do for the installed case. Optionally, we could remove the SO fetches from the first case and fetch from cache as well, but it should be the same functional result either way.

@joshdover
Copy link
Contributor

joshdover commented Mar 3, 2022

@jsoriano One question I have about the registry API: are there other consumers of this logic other than Kibana? What was the motivation behind putting this high-level data stream information in the API in the first place?

A second question: if we intend on keeping this in the registry API for other consumers, does it make sense to explore a WASM-based solution to sharing this logic between the registry and Kibana? I imagine this would make implementing #70582 much simpler.

@jsoriano
Copy link
Member Author

jsoriano commented Mar 3, 2022

One question I have about the registry API: are there other consumers of this logic other than Kibana?

Not that I know. We have some documentation generators that pull information from the registry, but they were implemented with this separation of concerns in mind. AFAIK they get the information from the package itself and not from the API.
elastic-package also queries the registry for some operations, but only to check the presence of packages and their versions.

What was the motivation behind putting this high-level data stream information in the API in the first place?

I guess that there are only legacy reasons. Before we had the package spec the interfaces between the different components were a bit blurry, and the only responsible of the package formats was the registry, there was not so clear separation between the package format and the API to distribute them.

Now we are trying to make a clearer separation of concerns. The registry API should be used only to discover packages. The spec can evolve independently, adding fields to the manifest or any kind of files without needing to modify the API of the registry. We can consider adding discovery features based on new information stored in the spec, but they should be independent things.

A second question: if we intend on keeping this in the registry API for other consumers, does it make sense to explore a WASM-based solution to sharing this logic between the registry and Kibana? I imagine this would make implementing #70582 much simpler.

This logic is part of the coupling between the registry and the spec that we want to remove. Most of this shouldn't be in the registry. The registry shouldn't need to know about all the contents of a package, only the resources it needs to help on their discoverability.

One of the reasons the registry has this code is because it still has its own implementation for package validation. If we refactor the registry to use the spec for validation (something we plan to do), this code is likely going to be removed, or reduced to the minimal needed to satisfy current APIs. It shouldn't be expected that this code is going to be aligned with the spec.

The structs in this file are also available in the spec in jsonschema format (here for the data_stream manifests for example). This could be used by other consumers.

In summary, the Package Spec should be considered the only source of truth for the structure of a package. Following the spec should be enough for any application to implement support for packages. The Package Registry is a different thing, intended to distribute packages, and it should be used only to discover or retrieve packages.

Regarding the use of wasm, we were considering using it to expose specific functionality of the spec, as package validation, but this is in very early stages of investigation, there are still many open questions, specially if we consider using it to expose also the spec itself.

I imagine this would make implementing #70582 much simpler.

Regarding this, now that Kibana supports installing pre-bundled packages, is logic from the registry still needed?

@joshdover
Copy link
Contributor

Thanks for the detailed answer. I think we're aligned on all of this. If there are no other consumers of the API, I agree we should move towards implementing this all in Kibana and look towards removing this from the registry API altogether. Attempting to do any code sharing here doesn't make sense.

Regarding this, now that Kibana supports installing pre-bundled packages, is logic from the registry still needed?

We discovered that we have a more to do for bundled packages due to this problem and are looking to close the gap in #126695 for bundled packages and then we'll explore fixing this more broadly for all packages as part of solving this issue and #70582. Timeline on that is TBD still.

@joshdover
Copy link
Contributor

@kpollich It'd be good to get an understanding of any remaining work required after #125915 is complete at least for getting parity with the package registry.

For this task I think we still need to:

@jen-huang jen-huang changed the title Fleet should rely on the content of a package when installing it [Fleet] Rely on the content of a package when installing it Mar 7, 2022
@kpollich
Copy link
Member

kpollich commented Mar 8, 2022

Presented a quick knowledge share around Fleet's various touchpoints with the EPR service and some specific callouts around our caching implementation: https://gist.github.com/kpollich/19160d0ab0a61e7c4c71704549977429. This may be helpful in prepping when someone picks this up in 8.3.

@joshdover
Copy link
Contributor

We should explore using this WASM build of the package-spec's validation code from Kibana as part of this effort: elastic/package-spec#285

@jsoriano
Copy link
Member Author

Thanks @criamico! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.6 candidate Feature:EPM Fleet team's Elastic Package Manager (aka Integrations) project Team:Fleet Team label for Observability Data Collection Fleet team technical debt Improvement of the software architecture and operational architecture
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants