-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audit analytic to checks all bundles of a catalog #66
Audit analytic to checks all bundles of a catalog #66
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of good stuff here conceptually. A few major details need discussion:
- Much of this EP discusses the marriage of
operator-sdk
andopm
functionality, but I don't see concrete dicsussion of how that will occur. Problems:opm
is intended for catalog maintainers whileoperator-sdk
is for individual operator authors who likely won't be touching a production catalog. This, amongst technical reasons, is whyopm
is the canonical location for index operations.- Scorecard is currently an OSDK subproject, meaning it would need to be split into its own first-class project before
opm
could use it.
- Running scorecard tests on every single bundle in a catalog, which can contain several hundred operators at different versions, would take a long time (hours), so failure modes and parallelism need to be discussed here.
- There is currently no official documentation about bundling tests into production bundles, which is what this command would be reading.
I think (3) is the biggest of the above problems. I'd at least like to see an empirical analysis of how many bundles in an existing production catalog (the OCP 4.6 catalog for example) contain scorecard configs.
enhancements/audit-command.md
Outdated
- Extract the database from the image informed | ||
- Via sql get all bundles name and paths from the index catalog image | ||
- Download and extract all bundles files by using the operator bundle path | ||
- Get the required data for the report via sql from the index, from the operator bundle manifest files and from the project repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What project repository? Each operator's? That doesn't sound in-scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is described in the proposal. The operator bundle repositories as done indeed in the POC linked here. I will clarify it in this line as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really thank you. clarified 👍
enhancements/audit-command.md
Outdated
|
||
### Risks and Mitigations | ||
|
||
A risk is the command take to long to be executed. In order to mitigate it, the flags `--wait-time` and customize the checks that should be done as the data that could be used was added: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--wait-time
should be configured on a per-bundle or per-test basis, since the caller would have to calculate wait time * bundles in catalog * number of tests
to get a reasonable estimate for an overall wait time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on that I changed the EP and added the following info:
Note that the tool will have a default timeout set up for the command and then, it can be changed by the flag `--wait-time for who is using it and has a better idea over what would be the expected reasonable time for the specific scenario.
The proposal has no intention to address a complexity logic to try to guess a default timeout based on the number of bundles and other possible combination aspects. Beside be possible to try to calculate that based on some logic such as wait time * bundles in catalog * number of tests
it would bring complexities which show not required at least in the first moment since users can indeed set any value as please them. See that, it can change a lot since the user might be running the full report or not, might be using the checks or not, etc. So, to keep the simplicity the approach adopted is to reduce the effort required to put it in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in the EP as well. So, I am closing, however, please feel free to re-open if you wish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree that having an overall timeout is useful, unless you set it to 10 hours or something ridiculous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the alpha version, I do not think that we need or should to address this kind of complexity. I'd prefer the timeout as 2 hours by default instead (educated guess). We are allowing users to change it as please them which shows enough.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before we invest too much energy in this wrapper/helper command, i'd like to see us put our energy into finalizing the actual new validation logic we want to add to the sdk.
open questions on validators for the sdk include:
- where will new validation code (such as specific for OCP requirements) live?
- will we define a pluggable model for adding validators to the sdk?
- do we have a finalized list of the first pass set of new validation rules/requirements for ocp + operatorhub?
Anyone can write some scripts to pull down a catalog, extract the bundle info, and run validation against each bundle, once the validation exists, so i do not see the audit command as being quite as urgent. Is it useful? sure. But i don't want to see us get too distracted by it until we do the prerequisite work.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are my comments for my first pass through. I've got more to finish reading.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
i don't think it's 100% true that they have no relation. The proposal to add new (pluggable?) validation commands to the sdk so that we can have OCP version specific validation, will absolutely have an impact on how this command, which will wrapper that one, gets implemented. (And to be clear, i'm referring to openshift/enhancements#661 which, while technically a downstream EP, may lead to new upstream logic being added to support it). My fundamental example of "until the validation commands are more fully fleshed out, we can't know what it will take to wrap them" is validation logic like this which is proposed in #65 (and similar things that would be needed in the downstream ocp validation logic):
I think if we are going to be adding validation logic that depends on the kubeversion (which i agree is useful), we also need to add a way for the invoker of the command to specify the kubeversion they want to validate against. Not all operators specify minKubeVersion. And in some cases it may be desirable to set an alternate kubeversion anyway. This gets even more common (and complicated since now we have concepts of min + max versions) when downstream ocp validation logic exists and we want to wrap that into a catalog audit command. So what will that mean for And what if i want to run the optional validation logic? Are we going to wire through the These sorts of experience questions, and the general fact that I think it's more urgent for us to deliver on the operatorhub+ocp validation improvements are why I am trying to avoid people investing energy/effort here at this particular moment. The answers to these questions will be more obvious once we get further with the validation extensions. I'll also reiterate what @dmesser said on the arch call and which I agreed with at the time: The really novel thing that can be added here is true "catalog-level" (or at least package-level) validation which looks at all the bundles of a package holistically and attempts to detect potential error conditions. That is something the sdk cannot do. So to the extent that we have an urgent need to fill a gap, that is the gap. I realize that if we're going to build a tool that closes that gap, and does some of the other reporting that this EP proposes, also including logic that wraps+invokes sdk validation tools is relatively trivial (the UX/cli argument questions i posed above not withstanding) but that(holistic package level consistency validation) doesn't seem to be the main thrust of the proposal right now. And fundamentally what i'm trying to raise is a prioritization question, not a "does this have technical value" question. We have limited resources+attention span. I would prefer to see us spend that on adding the validation logic (both the upstream+downstream validation logic) over this, at this current point in time. |
Hi @bparees, Really thank you for your inputs. And I got them. I need some time to see how we can address all these concerns and feedbacks. I will ping you be sure when it be in that stage. To clarifies that I am flagging this proposal as WIP. |
This comment has been minimized.
This comment has been minimized.
Hi @jmccormick2001, @estroz, @jmrodri, @bparees, @bentito It is updated trying to cover all questions and points raised as well. it might good enough for a second round. |
I am removing the blocker to get this merged since it does not affects SDK and it is also clarified in the EP.
Hi @jmccormick2001, @bparees, This proposal is updated really thank you for your suggestions and feedbacks. The idea here is we are able to have EP for we start to work in alpha solution and it actually seems like too be covering more details and points that would be mandatory for its first version. Also, it's updated now describing that the first version will probably be done in its own repo. This proposal does not affect SDK, OPM as any other solution in the org which also is clarified in the EP. Only tries to use what already exists to obtain the required data. The implementation of the EP https://github.com/operator-framework/enhancements/blob/master/enhancements/operator-hub-check.md has more priority and will be done first as discussed 👍 . However, I would like to kindly ask for your help here for we are able to get this EP merged. After the above implementation I will probably be working on a reduced version of it ( like a POC to move forward with this idea and we have a getting start point). In advanced, thank your time and all help and support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i understand you just want agreement to start what amounts to a PoC here. I think there's enough detail to do that. but i'll definitely want to see more discussion about some of these implementation details, code ownership(sdk vs opm vs something else), interfaces to external tools, and config args before we'd consider promoting something like this to anything more than alpha.
enhancements/audit-command.md
Outdated
|
||
9. Will audit command works with downstream and upstream specific checks? | ||
|
||
To address specific downstream concerns and checks are out of the scope of this proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's ok to defer this question for now since you're also not proposing a permanent home (opm, sdk, something else) for the implementation of this tool. But before we pick a final home for it, we will need to revisit this question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea so far is the audit tool have its own repo. After we implement that if we decide that it should be moved to any other place than by sure we can revisit it. See that was clarified in the first open question.
--list-optional-validators List all optional validators available. When set, no report will be execute | ||
`--optional-values` string map with key and values which will be informed to the validators e.g`"k8s=1.21"` | ||
--checks-only If set, disable the data report. It is useful for who would like to audit the index only runnning the checks and validations. | ||
--head-only If set, audit only the operator bundles which are the head of the channel. (default all bundles of the index catalog image will be used) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about a --filter flag that takes a regex for selecting which packages or csvnames to check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added. See:
To address this requirement a user will be able to use the flag
--filter
where the tool will only get the operator bundles where the package name is like%name%
.
Why not accepted regex?
See that some personas who would like to use it might not be developers then, I am trying here to make it simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See that some personas who would like to use it might not be developers then, I am trying here to make it simple.
regex is still the right answer here. Regex works very well even if you do not know regex because if you just pass a fixed string like foo
it will still match *foo*
. But it leaves the power of regex available for someone who knows it and needs its flexibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the regex option. However, for a POC/alpha version that seems fine, we use simple SQL. Note that the regex might be so simple to be used with I am not sure if it is supportable for the stack used. Then, that requires a little more effort to be accomplished.
enhancements/audit-command.md
Outdated
|
||
By using this flag, the audit command will filter the operator bundles in the first step to download and extract just this ones. | ||
|
||
To know the operator bundles which are the latest head we will get only the entries which has values in the `operatorbundle` table for both `csv` and `bundle` column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elsewhere you've tried to indicate loose coupling to the index format, but this expresses a very tight coupling that will break once the index moves away from sqllite.
I think you need to identify channel heads the same way opm does, which i believe is based on semver sorting of the version value, with +buildid being a tiebreaker on otherwise identical semvers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the info.
/hold I have been working on its POC and I will probably perform changes here soon. See: https://github.com/camilamacedo86/audit |
/approve |
This ep is a product of the design of an audit command that could be used by the PE team, external users, and internal product teams to compare their operators against various audit criteria.
The audit tool should be used to examine a given catalog's set of operators and produce a report that is readable by end-users but also be exported to json or other formats that can be processed.
Some current audit criteria ideas:
The additional checks required to address all criteria not implemented in the validators is not part of this EP and has been proposed in other ones such as #65.