Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clearer documentation on driver removal process from 20200901-artifacts-cleanup.md proposal #1399

Closed
mmcaya opened this issue Sep 14, 2020 · 9 comments

Comments

@mmcaya
Copy link

mmcaya commented Sep 14, 2020

What to document

The recent proposal to remove pre-built drivers and its subsequent execution have broken live systems that were using those pre-built drivers for releases prior to 0.24.0, and were give no actionable notice they would be removed.

Prior to the this proposal being written and merged, there was expectation from end-users that they would continue to be available for prior released versions. It is understood that the reason for this change was to support space and cost saving efforts.

Please consider the following documentation updates:

  • Clearly mark a driver as deprecated in the CHANGELOG.md in a release prior to its actual removal
    • This is especially important because the proposal itself notes new driver versions may or may not be used in each new release, leading to more unpredictability
    • Consistently note the current driver version in the CHANGELOG.md as well (e.g. the 0.25.0 release didn't note its upgrade to driver version ae104eb, so you have to know to look here to get it if you wanted to confirm a driver was available (or build one) prior to running it
  • Update documentation to indicate if proposed archival of deleted drivers to S3 would be available for end-user consumption, or simply as an internal archive.
  • Update the https://falco.org/docs/installation/#install-driver guide to clearly indicate that pre-built modules are only supported for specific driver and falco versions, and that failure to update versions will result in system outages unless alternate action is taken
  • Document a schedule or cadence for when the drivers actually get deleted to support production system predicability

Additional, please consider retaining at least 3 released versions of the driver module to better support more stable upgrade paths.

In this specific case, 0.25.0 was released only ~7 days prior to the removal of the 0.23.0 pre-built drivers, and your documented best-effort processes for provisioning new drivers notes a potential 1-2 week delay.
As a result, end-users were only left with a single potential version (0.24.0) which all their deployments needed to be on if they are using the pre-built module download capability. This is important to organizations that have process or requirements to follow prior to component upgrades, and may not be able to maintain that fast a pace.

Related issue:
#1390

Related PRs:
481eedb#diff-7ec07c309635dee21e638c9c49ea9dab
falcosecurity/test-infra#169
falcosecurity/test-infra#170

@krisnova
Copy link
Contributor

See my response from slack

Hey @Fahad thanks for bringing this up. I don't think you are alone with being frustrated and I think we can totally figure out a clean solution here.
I don't think @leodido intended to break any live systems here with this proposal.
In general I am also disappointed that a restriction such as storage has caused Falco maintainers to have to "work around" a problem. I do not believe removing anything from storage is a viable technical solution here and we should not have done that.
What we can do:
Bring this up on the weekly Wed calls so we can figure out what the use cases and constraints are (I am still confused here myself).
Once we understand what we need to do (and what is currently blocking us) from a storage perspective we can reach out to @amye and @caniszczyk with the CNCF to ask for resources.
But we need to understand what we are asking for first, and I believe the weekly call is the correct place to do that. We need to be able to quantify what we need somehow.
Do you think you would be able to join the call or help out with this effort @Fahad?

Keeping this issue open as a placeholder.


TLDR: I don't understand why this is still a problem, and why a "solution" to this was deleting artifacts. That seems like a drastically inappropriate response.

@leodido
Copy link
Member

leodido commented Sep 15, 2020

Hello,

some considerations about the cleanup of pre-built drivers.

First of all, the Falco maintainers set a very good set of expectations around the providing of the pre-built drivers.
In fact, the initial write-up about this (1) is very clear about the best-effort nature of those.
Moreover, the Falco project did not yet reach yet a 1.0 milestone and has been always very clear about the fact that we only officially support the artifacts in the proposals 1, 2 (by @kris-nova). As you can read there, the only officially supported artifacts are the packages about the latest Falco releases. There's no mention of the official support of pre-built drivers.

Nevertheless, to ease the users' experience with Falco installation, even if it's able to automatically build the drivers on-the-fly (when the kernel headers are present in the host), we decided to provide a set of various pre-built drivers for various kernel release and distro combinations (as per test-infra dbg).

Given all this, and given the state of Falco as an open-source project, to find a solution we'll need two types of contributions:

  • proactive and make your voice heard in the community calls which are the ultimate decision-making tool for the Falco community
  • help in clearing the maintenance tasks around this: processes changes, storage quotas discussion with the providers, etc.

One very good example of how to make the experience for everyone better is the work that Jonah Jones is doing with the Prow moving back to AWS.

@mmcaya I like your documentation updates suggestions, thanks for proposing them.

TLDR
The Falco community has operational needs that can be improved.
However, the help of everyone is required to make that happen.
Every change to the project has always been discussed and documented.
New changes and improvements can be implemented but we need to find owners for them.

@stale
Copy link

stale bot commented Nov 15, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Issues labeled "cncf", "roadmap" and "help wanted" will not be automatically closed. Please refer to a maintainer to get such label added if you think this should be kept open.

@stale stale bot added the wontfix label Nov 15, 2020
@fntlnz fntlnz removed the wontfix label Nov 16, 2020
@fntlnz
Copy link
Contributor

fntlnz commented Nov 16, 2020

/help

@poiana
Copy link
Contributor

poiana commented Nov 16, 2020

@fntlnz:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@poiana
Copy link
Contributor

poiana commented Feb 14, 2021

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@poiana
Copy link
Contributor

poiana commented Mar 16, 2021

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

@poiana
Copy link
Contributor

poiana commented Apr 15, 2021

Rotten issues close after 30d of inactivity.

Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Provide feedback via https://github.com/falcosecurity/community.
/close

@poiana
Copy link
Contributor

poiana commented Apr 15, 2021

@poiana: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Provide feedback via https://github.com/falcosecurity/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@poiana poiana closed this as completed Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants