Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't skip/requeue alerting rules with new "other" fields #166967

Closed
kobelb opened this issue Sep 21, 2023 · 4 comments
Closed

Don't skip/requeue alerting rules with new "other" fields #166967

kobelb opened this issue Sep 21, 2023 · 4 comments
Assignees
Labels
Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@kobelb
Copy link
Contributor

kobelb commented Sep 21, 2023

When #158022 was implemented, the approach did not differentiate between the two categories of fields that can be added to the alerting rule schema: parameters and "other". Fields in the parameters category alter the behavior of alerting rules when they run, while fields in the "other" category do not. Using the existing alerting rule schema, the following are examples of fields that fall into the two different categories:

parameters: enabled, consumer, alertTypeId, apiKey, muteAll, muteInstanceIds, throttle, schedule, isSnoozedUntil, snoozeSchedule, actions, notifyWhen, mapped_params, params
"other": name, tags, apiKeyOwner, apiKeyCreatedByUser, createdBy, updatedBy, updatedAt, createdAt, revision, running, meta, executionStatus, monitoring, lastRun, nextRun

When a new field is added to the parameters category, if it's specified then an older version of Kibana that doesn't know about the field should NOT attempt to run the alerting rule. Doing so would result in incorrect behaviors. However, when new fields are added to the "other" category, it's safe for an older Kibana node that doesn't know about the field to run the alerting rule.

Differentiating between these two categories of fields is challenging to do in a backward compatible manner, and the naive approach that I recommended of using a validation schema to detect unknown fields is problematic.

After having a sync and async discussion with @mikecote and @ersin-erdal we decided that we should prototype what it would look like to add an explicit "runtime version" field to the alerting rule. This field would be used to control which versions of Kibana have the capabilities to run specific alerting rules.

@kobelb kobelb added Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework labels Sep 21, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@kobelb
Copy link
Contributor Author

kobelb commented Sep 21, 2023

/cc @mikecote @ersin-erdal

@kobelb kobelb changed the title Don't skip/requeue alerting rules with new state fields Don't skip/requeue alerting rules with new "other" fields Sep 22, 2023
@pmuellr
Copy link
Member

pmuellr commented Oct 16, 2023

Looking at #158022 and the mention of follow-on issue #159302 - that issue appears to be closed by the PR that spawned it as a follow-on. So either the comment is wrong - it's not a follow-on, it was implemented here; or the issue somehow got closed accidentally. Or something :-)

Anyway!

I was thinking - we really want to ignore these at the task manager level, don't we? What I was thinking is that each task type could register a modelVersion for it's per-type data - optionally. If set, then we use that when polling for tasks, as a new field in the task doc - taskTypeModelVersion or such. Looking for anythng <= to that. We'd need to add a migration to add that modelVersion for rules and connectors, once we start checking.

Perhaps I'm missing something here though.

ersin-erdal added a commit that referenced this issue Nov 28, 2023
Towards: #166967

This PR adds `ruleModelVersion` to saved object registry and
`latestRuleVersion` to ruleTypeRegistry.
These new assets will be used in a follow-on PR for skipping the rule
task executions when there is version mismatch.

POC for the issue: #167128

---------

Co-authored-by: kibanamachine <[email protected]>
@ersin-erdal
Copy link
Contributor

closing in favor of: #176584

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

4 participants