Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip rule execution when it has a newer typeVersion than Kibana instance #172161

Conversation

ersin-erdal
Copy link
Contributor

@ersin-erdal ersin-erdal commented Nov 29, 2023

Resolves: #166967

This PR adds a new logic to skip task execution by using the modelVersions of saved objects.

schemas.create in modelVersions are used by Kibana to validate the SO that they are attached before saving them.
So it will force developers to bump the versions whenever there is a change in the SO (add, remove etc.)

We have already saved the latestRuleVersion in ruleTypeRegistry with the following PR #171927

And with this PR, we also save the typeVersion in a rule on its create, update and execute actions, so we can pass the rule type version to TaskManager with loadIndirectParams.
Then we can compare the latestRuleVersion with typeVersion in TaskManager and skip the execution if typeVersion of a rule is greater than latestRuleVersion. This means the rule has already been updated by a newer Kibana version.

But before passing typeVersion to TM, we recursively call the isCompatibleWithPreviousVersion method of the rule versions until one of them return false (or up to version 1) then return that version to TM.

isCompatibleWithPreviousVersion takes the rawRule as param, so a developer can decide if the new version is compatible with the previous one on run time.

For instance: The new field muted is added to the rule SO. And a developer wants to if the fetched rule SO is compatible with the previous version, so they can do:

isCompatibleWithPreviousVersion: (rawRule) => rawRule.muted ? true : false,

Another change that is introduced by tis PR:
As we are replacing the indirectParams validation with version check in TM, we decided to validate only the rule.params rather than the whole rawRule. Therefore we return rule.params as indirectParams and ruleType.validate.params as indirectParamsSchema

Note: we do indirectParamsSchema.validate(indirectParams) in TM.

To verify:

Enable skipping feature in config:

xpack:
 task_manager:
   requeue_invalid_tasks:
     enabled: true
     delay: 1000
     max_attempts: 10

Create a rule and check if it has the typeVersion:1 field,

replace typeVersion with typeVersion: 2 (current latest rule type version is 1) in here

TM should be skipping the rule execution and print 2 warnings:
Task (xxx/xxx) has a newer version(2) than expected(1)
Task Manager has skipped executing the Task (xxx/xxx) x times as it has invalid params.

after 10 skips, the should be executed.


typeVersion field should be addedd to the rules from an older Kibana version (after upgrade) on the first execution.

275910888-23df2344-afa1-4176-85a8-30c0e955c1cf

@ersin-erdal ersin-erdal added release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.12.0 labels Nov 29, 2023
@ersin-erdal ersin-erdal self-assigned this Nov 29, 2023
@@ -941,7 +943,6 @@ export class TaskRunner<

return { interval: retryInterval };
}),
monitoring: this.ruleMonitoring.getMonitoring(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is redundant and not used by TaskManager.

const { data } = await this.task.loadIndirectParams();
if (data) {
// validate indirect params e.g. connector or rule.params
if (indirectParamsSchema && indirectParams) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patrick left a note about this check in this PR's POC.
When there is a version mismatch the task is skipped 100 times (default number) then doesn't return a skipError in order to let the rule run.
So it comes to this line and does indirectParamsSchema validation as well. As it has already reached the maxAttempts just logs a warning and doesn't return any skipError either.

We may skip this validation when there is already an error (hasValidationError). WDYT?

@ersin-erdal ersin-erdal marked this pull request as ready for review November 29, 2023 17:33
@ersin-erdal ersin-erdal requested a review from a team as a code owner November 29, 2023 17:33
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@@ -212,7 +212,7 @@ export default function executionStatusAlertTests({ getService }: FtrProviderCon
})
.expect(200);

executionStatus = await waitForStatus(alertId, new Set(['error']));
executionStatus = await waitForStatus(alertId, new Set(['error']), 50000);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this waits for the task execution to be skipped 2 times and then run the rule, increased the delay to make it long enough for 2 skips.

@@ -315,7 +314,7 @@ export class RuleTypeRegistry {
spaceId: schema.string(),
consumer: schema.maybe(schema.string()),
}),
indirectParamsSchema: rawRuleSchemaV1,
indirectParamsSchema: ruleType.validate.params,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we no longer validating the rule schema and just the rule parameters? Seems like changes to the rule schema could also potentially be backwards incompatible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually version check replaces the schema check, but i wanted to use it for params validation rather than removing the whole feature. Maybe removing is just better as we already have another issue for params, config and secrets validation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the rule schema is just used for modelVersions which validates the rule just before creating the SO in ES.
This is to force developers to update the schema and bump the version.

@ymao1
Copy link
Contributor

ymao1 commented Nov 30, 2023

Can you provide instructions for verifying this PR? Under what conditions should it succeed? Under what conditions should it fail? I am familiar at a high level but it would be helpful to document these things as well.

@ersin-erdal ersin-erdal force-pushed the 166967-skip-tasks-with-wrong-version branch from e441fb8 to 08777fd Compare December 1, 2023 01:18
@ersin-erdal
Copy link
Contributor Author

Can you provide instructions for verifying this PR? Under what conditions should it succeed? Under what conditions should it fail? I am familiar at a high level but it would be helpful to document these things as well.

Added how-to-verify section to the description

@pmuellr
Copy link
Member

pmuellr commented Dec 4, 2023

I'm not seeing the expected results from To verify: - the rule seems. to continue running without extra logging.

Perhaps because I had a 1s interval? Will try again with a 1m. Also I figured I'd disable the rule before making the code change, make the code change, let Kibana restart and spam the console, wait for it to finish, then re-enable the rule in hopes of more clearly seeing the messages. Should work, right?

@ersin-erdal
Copy link
Contributor Author

ersin-erdal commented Dec 4, 2023

I'm not seeing the expected results from To verify: - the rule seems. to continue running without extra logging.

Perhaps because I had a 1s interval? Will try again with a 1m. Also I figured I'd disable the rule before making the code change, make the code change, let Kibana restart and spam the console, wait for it to finish, then re-enable the rule in hopes of more clearly seeing the messages. Should work, right?

Opss, I was confused...
You need to replace typeVersion not rule.version, fixed the verification steps as well. Sorry :)

edit: There was one more missing thing, you need to pull the PR again.

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
alerting 794 795 +1
taskManager 62 63 +1
total +2
Unknown metric groups

API count

id before after diff
alerting 825 826 +1
taskManager 105 106 +1
total +2

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ersin-erdal

@ersin-erdal
Copy link
Contributor Author

The feature has been reverted.

@ersin-erdal ersin-erdal deleted the 166967-skip-tasks-with-wrong-version branch March 25, 2024 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't skip/requeue alerting rules with new "other" fields
5 participants