Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit analytic to checks all bundles of a catalog #66

Merged
merged 1 commit into from
Apr 22, 2021
Merged

Audit analytic to checks all bundles of a catalog #66

merged 1 commit into from
Apr 22, 2021

Conversation

camilamacedo86
Copy link
Contributor

@camilamacedo86 camilamacedo86 commented Feb 23, 2021

This ep is a product of the design of an audit command that could be used by the PE team, external users, and internal product teams to compare their operators against various audit criteria.

The audit tool should be used to examine a given catalog's set of operators and produce a report that is readable by end-users but also be exported to json or other formats that can be processed.

Some current audit criteria ideas:

  • PDB api deprecation
  • v1beta1 CRD deprecation
  • annotations defining supportability characteristics such disconnected, multi-arch etc.
  • channel naming compliance
  • capability levels - maybe this could be also run within a scorecard test

The additional checks required to address all criteria not implemented in the validators is not part of this EP and has been proposed in other ones such as #65.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 23, 2021
@camilamacedo86 camilamacedo86 changed the title WIP - add audit command to opm Add audit command to audit all bundles of an catalog Mar 1, 2021
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 1, 2021
estroz
estroz previously requested changes Mar 1, 2021
Copy link
Member

@estroz estroz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of good stuff here conceptually. A few major details need discussion:

  1. Much of this EP discusses the marriage of operator-sdk and opm functionality, but I don't see concrete dicsussion of how that will occur. Problems:
    • opm is intended for catalog maintainers while operator-sdk is for individual operator authors who likely won't be touching a production catalog. This, amongst technical reasons, is why opm is the canonical location for index operations.
    • Scorecard is currently an OSDK subproject, meaning it would need to be split into its own first-class project before opm could use it.
  2. Running scorecard tests on every single bundle in a catalog, which can contain several hundred operators at different versions, would take a long time (hours), so failure modes and parallelism need to be discussed here.
  3. There is currently no official documentation about bundling tests into production bundles, which is what this command would be reading.

I think (3) is the biggest of the above problems. I'd at least like to see an empirical analysis of how many bundles in an existing production catalog (the OCP 4.6 catalog for example) contain scorecard configs.

enhancements/audit-command.md Outdated Show resolved Hide resolved
enhancements/audit-command.md Outdated Show resolved Hide resolved
enhancements/audit-command.md Outdated Show resolved Hide resolved
enhancements/audit-command.md Outdated Show resolved Hide resolved
- Extract the database from the image informed
- Via sql get all bundles name and paths from the index catalog image
- Download and extract all bundles files by using the operator bundle path
- Get the required data for the report via sql from the index, from the operator bundle manifest files and from the project repository
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What project repository? Each operator's? That doesn't sound in-scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is described in the proposal. The operator bundle repositories as done indeed in the POC linked here. I will clarify it in this line as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really thank you. clarified 👍

enhancements/audit-command.md Show resolved Hide resolved
enhancements/audit-command.md Show resolved Hide resolved
enhancements/audit-command.md Outdated Show resolved Hide resolved

### Risks and Mitigations

A risk is the command take to long to be executed. In order to mitigate it, the flags `--wait-time` and customize the checks that should be done as the data that could be used was added:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--wait-time should be configured on a per-bundle or per-test basis, since the caller would have to calculate wait time * bundles in catalog * number of tests to get a reasonable estimate for an overall wait time.

Copy link
Contributor Author

@camilamacedo86 camilamacedo86 Mar 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on that I changed the EP and added the following info:

Note that the tool will have a default timeout set up for the command and then, it can be changed by the flag `--wait-time for who is using it and has a better idea over what would be the expected reasonable time for the specific scenario.

The proposal has no intention to address a complexity logic to try to guess a default timeout based on the number of bundles and other possible combination aspects. Beside be possible to try to calculate that based on some logic such as wait time * bundles in catalog * number of tests it would bring complexities which show not required at least in the first moment since users can indeed set any value as please them. See that, it can change a lot since the user might be running the full report or not, might be using the checks or not, etc. So, to keep the simplicity the approach adopted is to reduce the effort required to put it in place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the EP as well. So, I am closing, however, please feel free to re-open if you wish.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree that having an overall timeout is useful, unless you set it to 10 hours or something ridiculous.

Copy link
Contributor Author

@camilamacedo86 camilamacedo86 Mar 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the alpha version, I do not think that we need or should to address this kind of complexity. I'd prefer the timeout as 2 hours by default instead (educated guess). We are allowing users to change it as please them which shows enough.

enhancements/audit-command.md Outdated Show resolved Hide resolved
@camilamacedo86

This comment has been minimized.

Copy link
Contributor

@bparees bparees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before we invest too much energy in this wrapper/helper command, i'd like to see us put our energy into finalizing the actual new validation logic we want to add to the sdk.

open questions on validators for the sdk include:

  1. where will new validation code (such as specific for OCP requirements) live?
  2. will we define a pluggable model for adding validators to the sdk?
  3. do we have a finalized list of the first pass set of new validation rules/requirements for ocp + operatorhub?

Anyone can write some scripts to pull down a catalog, extract the bundle info, and run validation against each bundle, once the validation exists, so i do not see the audit command as being quite as urgent. Is it useful? sure. But i don't want to see us get too distracted by it until we do the prerequisite work.

enhancements/audit-command.md Outdated Show resolved Hide resolved
@camilamacedo86

This comment has been minimized.

@camilamacedo86 camilamacedo86 changed the title Add audit command to audit all bundles of an catalog audit command to audit all bundles of a catalog Mar 3, 2021
enhancements/audit-command.md Outdated Show resolved Hide resolved
enhancements/audit-command.md Outdated Show resolved Hide resolved
enhancements/audit-command.md Outdated Show resolved Hide resolved
enhancements/audit-command.md Show resolved Hide resolved
enhancements/audit-command.md Outdated Show resolved Hide resolved
enhancements/audit-command.md Outdated Show resolved Hide resolved
Copy link
Member

@jmrodri jmrodri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are my comments for my first pass through. I've got more to finish reading.

enhancements/audit-command.md Show resolved Hide resolved
enhancements/audit-command.md Outdated Show resolved Hide resolved
enhancements/audit-command.md Outdated Show resolved Hide resolved
enhancements/audit-command.md Outdated Show resolved Hide resolved
enhancements/audit-command.md Outdated Show resolved Hide resolved
enhancements/audit-command.md Outdated Show resolved Hide resolved
enhancements/audit-command.md Outdated Show resolved Hide resolved
enhancements/audit-command.md Outdated Show resolved Hide resolved
enhancements/audit-command.md Outdated Show resolved Hide resolved
enhancements/audit-command.md Outdated Show resolved Hide resolved
@estroz

This comment has been minimized.

@camilamacedo86

This comment has been minimized.

@estroz

This comment has been minimized.

@camilamacedo86

This comment has been minimized.

@bparees
Copy link
Contributor

bparees commented Mar 4, 2021

Also, in his comments, he is raising questions about another requirement that has no relation with this one.

i don't think it's 100% true that they have no relation. The proposal to add new (pluggable?) validation commands to the sdk so that we can have OCP version specific validation, will absolutely have an impact on how this command, which will wrapper that one, gets implemented. (And to be clear, i'm referring to openshift/enhancements#661 which, while technically a downstream EP, may lead to new upstream logic being added to support it).

My fundamental example of "until the validation commands are more fully fleshed out, we can't know what it will take to wrap them" is validation logic like this which is proposed in #65 (and similar things that would be needed in the downstream ocp validation logic):

To check if the operator bundle is using the Webhook API version which is no longer supported | error | If `spec.minKubeVersion` is  informed  and it is >= `1.22` then, check if the CSV has the spec `webhookdefinitions.admissionReviewVersions.v1beta1`. (See [here](https://github.com/operator-framework/operator-sdk/blob/v1.4.2/testdata/go/v2/memcached-operator/bundle/manifests/memcached-operator.clusterserviceversion.yaml#L203-L205). Note that in these cases, users should be informed that the `admissionregistration.k8s.io/v1beta1` was deprecated in Kubernetes `1.16` and it is removed in `1.22`. 

I think if we are going to be adding validation logic that depends on the kubeversion (which i agree is useful), we also need to add a way for the invoker of the command to specify the kubeversion they want to validate against. Not all operators specify minKubeVersion. And in some cases it may be desirable to set an alternate kubeversion anyway. This gets even more common (and complicated since now we have concepts of min + max versions) when downstream ocp validation logic exists and we want to wrap that into a catalog audit command.

So what will that mean for alpha audit --validation true? Do we end up adding a version parameter there? Declare it's not overrideable? Something else?

And what if i want to run the optional validation logic? Are we going to wire through the --select-optional flag that #65 relies on? Will this command be able to run the downstream validation logic that openshift/enhancements#661 introduces?

These sorts of experience questions, and the general fact that I think it's more urgent for us to deliver on the operatorhub+ocp validation improvements are why I am trying to avoid people investing energy/effort here at this particular moment. The answers to these questions will be more obvious once we get further with the validation extensions.

I'll also reiterate what @dmesser said on the arch call and which I agreed with at the time: The really novel thing that can be added here is true "catalog-level" (or at least package-level) validation which looks at all the bundles of a package holistically and attempts to detect potential error conditions. That is something the sdk cannot do. So to the extent that we have an urgent need to fill a gap, that is the gap.

I realize that if we're going to build a tool that closes that gap, and does some of the other reporting that this EP proposes, also including logic that wraps+invokes sdk validation tools is relatively trivial (the UX/cli argument questions i posed above not withstanding) but that(holistic package level consistency validation) doesn't seem to be the main thrust of the proposal right now.

And fundamentally what i'm trying to raise is a prioritization question, not a "does this have technical value" question. We have limited resources+attention span. I would prefer to see us spend that on adding the validation logic (both the upstream+downstream validation logic) over this, at this current point in time.

@camilamacedo86
Copy link
Contributor Author

Hi @bparees,

Really thank you for your inputs. And I got them. I need some time to see how we can address all these concerns and feedbacks. I will ping you be sure when it be in that stage. To clarifies that I am flagging this proposal as WIP.

@camilamacedo86 camilamacedo86 changed the title audit command to audit all bundles of a catalog WIP: audit command to audit all bundles of a catalog Mar 4, 2021
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 4, 2021
@camilamacedo86 camilamacedo86 changed the title WIP: audit command to audit all bundles of a catalog WIP: audit analytic to checks all bundles of a catalog Mar 4, 2021
@camilamacedo86

This comment has been minimized.

@camilamacedo86 camilamacedo86 changed the title WIP: audit analytic to checks all bundles of a catalog Audit analytic to checks all bundles of a catalog Mar 10, 2021
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 10, 2021
@camilamacedo86
Copy link
Contributor Author

Hi @jmccormick2001, @estroz, @jmrodri, @bparees, @bentito

It is updated trying to cover all questions and points raised as well. it might good enough for a second round.
I'd suggested we prioritize the reviews on #65 and openshift/enhancements#661. But is important it be also updated for we are able to check the relations.

@camilamacedo86 camilamacedo86 dismissed estroz’s stale review March 20, 2021 21:06

I am removing the blocker to get this merged since it does not affects SDK and it is also clarified in the EP.

@camilamacedo86
Copy link
Contributor Author

camilamacedo86 commented Mar 20, 2021

Hi @jmccormick2001, @bparees,

This proposal is updated really thank you for your suggestions and feedbacks.
Note that the alpha version of the audit will NOT cover the downstream checks (OPC). In this way, in POV the definition of openshift/enhancements#661 is no longer a blocker for it gets merged.

The idea here is we are able to have EP for we start to work in alpha solution and it actually seems like too be covering more details and points that would be mandatory for its first version.

Also, it's updated now describing that the first version will probably be done in its own repo. This proposal does not affect SDK, OPM as any other solution in the org which also is clarified in the EP. Only tries to use what already exists to obtain the required data.

The implementation of the EP https://github.com/operator-framework/enhancements/blob/master/enhancements/operator-hub-check.md has more priority and will be done first as discussed 👍 .

However, I would like to kindly ask for your help here for we are able to get this EP merged. After the above implementation I will probably be working on a reduced version of it ( like a POC to move forward with this idea and we have a getting start point).

In advanced, thank your time and all help and support.

c/c @gallettilance @bentito

Copy link
Contributor

@bparees bparees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i understand you just want agreement to start what amounts to a PoC here. I think there's enough detail to do that. but i'll definitely want to see more discussion about some of these implementation details, code ownership(sdk vs opm vs something else), interfaces to external tools, and config args before we'd consider promoting something like this to anything more than alpha.


9. Will audit command works with downstream and upstream specific checks?

To address specific downstream concerns and checks are out of the scope of this proposal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's ok to defer this question for now since you're also not proposing a permanent home (opm, sdk, something else) for the implementation of this tool. But before we pick a final home for it, we will need to revisit this question.

Copy link
Contributor Author

@camilamacedo86 camilamacedo86 Mar 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea so far is the audit tool have its own repo. After we implement that if we decide that it should be moved to any other place than by sure we can revisit it. See that was clarified in the first open question.

enhancements/audit-command.md Show resolved Hide resolved
--list-optional-validators List all optional validators available. When set, no report will be execute
`--optional-values` string map with key and values which will be informed to the validators e.g`"k8s=1.21"`
--checks-only If set, disable the data report. It is useful for who would like to audit the index only runnning the checks and validations.
--head-only If set, audit only the operator bundles which are the head of the channel. (default all bundles of the index catalog image will be used)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about a --filter flag that takes a regex for selecting which packages or csvnames to check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. See:

To address this requirement a user will be able to use the flag --filter where the tool will only get the operator bundles where the package name is like %name%.

Why not accepted regex?

See that some personas who would like to use it might not be developers then, I am trying here to make it simple.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See that some personas who would like to use it might not be developers then, I am trying here to make it simple.

regex is still the right answer here. Regex works very well even if you do not know regex because if you just pass a fixed string like foo it will still match *foo*. But it leaves the power of regex available for someone who knows it and needs its flexibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the regex option. However, for a POC/alpha version that seems fine, we use simple SQL. Note that the regex might be so simple to be used with I am not sure if it is supportable for the stack used. Then, that requires a little more effort to be accomplished.

enhancements/audit-command.md Outdated Show resolved Hide resolved
enhancements/audit-command.md Show resolved Hide resolved

By using this flag, the audit command will filter the operator bundles in the first step to download and extract just this ones.

To know the operator bundles which are the latest head we will get only the entries which has values in the `operatorbundle` table for both `csv` and `bundle` column.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elsewhere you've tried to indicate loose coupling to the index format, but this expresses a very tight coupling that will break once the index moves away from sqllite.

I think you need to identify channel heads the same way opm does, which i believe is based on semver sorting of the version value, with +buildid being a tiebreaker on otherwise identical semvers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the info.

@camilamacedo86
Copy link
Contributor Author

/hold

I have been working on its POC and I will probably perform changes here soon. See: https://github.com/camilamacedo86/audit

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 5, 2021
@bparees
Copy link
Contributor

bparees commented Apr 22, 2021

/approve

@camilamacedo86 camilamacedo86 merged commit 5023fab into operator-framework:master Apr 22, 2021
@camilamacedo86 camilamacedo86 deleted the audit-command branch April 22, 2021 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants