Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Onboard alerting tasks to use stateSchemaByVersion for task state validation of the framework fields #162425

Merged

Conversation

mikecote
Copy link
Contributor

@mikecote mikecote commented Jul 24, 2023

Resolves #159343

In this PR, I'm preparing the alerting rule task types for serverless by defining an explicit task state schema for the framework level fields. This schema is used to validate the task's state before saving but also when reading. In the scenario an older Kibana node runs a task after a newer Kibana node has stored additional task state, the unknown state properties will be dropped. Additionally, this will prompt developers to be aware that adding required fields to the task state is a breaking change that must be handled with care. (see #155764).

The PR also includes the following changes:

  • Modifying the @kbn/alerting-state-types package so the types are generated from config-schema, instead of io-ts. The schema is re-used for exporting a new stateSchemaByVersion property.
  • Removing DateFromString in favour of strings everywhere (config-schema doesn't support this conversion)
  • Add a v1 up migration that will ensure the types match the TypeScript interface on any existing alerting task. The migration assumes any type of data could exist and dropping parts that don't match the type expectation. The TypeScript interface uses schema.maybe() in a lot of places (as io-ts did), so safe if ever data gets dropped.
  • Cleanup the alerting/common/** exports to reduce bundle size. Because the @kbn/alerting-state-types package grew.
  • Since the new TypeScript interfaces / types are ReadOnly<...>, I created some Mutable... types for places that needed it (in order to avoid code refactoring).

To verify

Stack Monitoring:

  • To make TypeScript happy with the new ReadOnly RawAlertInstance type, I removed some redundant code and solved the issue.

Security Solution:

  • Changes to the alertInstanceFactoryStub set the alert's date to a string instead of a Date value. Note: The HTTP API response converted Date objects to string, so the HTTP API response will look the same with this change.

Response Ops:

  • In a fresh Kibana install, create alerting rules and ensure they run.
  • In a 8.9 version, create some rules, upgrade to this branch and ensure they still run.
  • Compare the io-ts definition with the new config-schema. They should match 1:1.
  • Look for ways the migration code could fail.

Note: The main changes are within the following areas:

  • x-pack/plugins/alerting/server/rule_type_registry.ts
  • x-pack/packages/kbn-alerting-state-types/*

@mikecote mikecote added Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.10.0 labels Jul 24, 2023
@mikecote mikecote self-assigned this Jul 24, 2023
Comment on lines -41 to -48
.then((state: RuleTaskState) => {
return pipe(
ruleStateSchema.decode(state),
fold((e: Errors) => {
throw new Error(`Rule "${ruleId}" has invalid state`);
}, identity)
);
});
Copy link
Contributor Author

@mikecote mikecote Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer necessary because:

  • The validation happens on task save
  • The decode would convert string dates into date objects. But would get converted back into strings when stringifying the JSON response.
  • Stack Monitoring who is using this API uses RuleTaskState to be type safe.

@@ -73,7 +74,7 @@ describe('successful migrations for 8.8.0', () => {
});

function checkMetaInRuleTaskState(
actual: RuleTaskState,
actual: MutableRuleTaskState,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To support code like this without having to rewrite.

delete copy?.alertInstances?.[id].meta?.uuid;

@@ -253,7 +254,7 @@ function addAlertUUID(doc: SavedObjectUnsanitizedDoc<SerializedConcreteTaskInsta

// mutates alerts passed in
function addAlertUUIDsToAlerts(
alerts: RuleTaskState['alertInstances'] | undefined,
alerts: MutableRuleTaskState['alertInstances'] | undefined,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To support code like this without having to rewrite

 alert.meta.uuid = trackedAlert.alertUuid;

Comment on lines -197 to -201
if (filteredAlertInstance.state) {
accum[instanceId].state = {
alertStates: (filteredAlertInstance.state as AlertInstanceState).alertStates,
};
}
Copy link
Contributor Author

@mikecote mikecote Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RawAlertInstance is now a ReadOnly type, since this code was doing a mutation, TypeScript was complaining. This code felt like it wasn't doing anything so I decided to remove it instead of fixing it..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like state could have multiple properties set and this branch ensures we only keep alertStates, maybe to reduce payload sent to clients. Do you know if the state object can become big enough to be problematic ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh ok, that would make sense, thanks! I will keep this code around, at a first glance I wasn't sure what it was doing but it's clear now.

Do you know if the state object can become big enough to be problematic ?

It's possible for sure, there is a circuit breaker at the framework level to prevent more than 1,000 alert objects (and state objects) by default, so it is capped at some point but we haven't seen many issues around the storage size yet.

Copy link
Contributor Author

@mikecote mikecote Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klacabane I added the code back in base_rule.ts. I did it in a way that doesn't mutate the accum[instanceId] object to keep TypeScript happy (since RawAlertInstance is now a ReadOnly type). Let me know if this looks good to you 🙏

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great thanks!

Comment on lines -46 to -54
ruleStateSchema.decode(taskInstance.state),
fold((e: t.Errors) => {
throw new Error(
`Task "${taskInstance.id}" ${
alert ? `(underlying Alert "${alert.id}") ` : ''
}has invalid state at ${enumerateErrorFields(e)}`
);
}, t.identity)
),
Copy link
Contributor Author

@mikecote mikecote Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic moved to task manager on read of the state object. This code also used to convert date strings into dates which we now type them as strings all over, so this is no longer needed.

@@ -52,7 +52,7 @@ export class Alert<
ActionGroupIds extends string = never
> {
private scheduledExecutionOptions?: ScheduledExecutionOptions<State, Context, ActionGroupIds>;
private meta: AlertInstanceMeta;
private meta: MutableAlertInstanceMeta;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file does a bunch of mutations of the meta object, it felt best to continue that path for now instead of trying to fix it.

@mikecote mikecote marked this pull request as ready for review August 9, 2023 11:58
@mikecote mikecote requested review from a team as code owners August 9, 2023 11:58
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@mikecote
Copy link
Contributor Author

@elasticmachine merge upstream

@mikecote
Copy link
Contributor Author

@elasticmachine merge upstream

@mikecote mikecote added v8.11.0 and removed v8.10.0 labels Aug 16, 2023
@mikecote
Copy link
Contributor Author

@elasticmachine merge upstream

Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Verified that I was able to upgrade from earlier version (8.8) and see my detection/obs/stack rules with actions continue running and that I was able to create new rules that successfully ran.

@mikecote
Copy link
Contributor Author

@elasticmachine merge upstream

@mikecote mikecote changed the title Onboard alerting tasks to use stateSchemaByVersion for task state validation Onboard alerting tasks to use stateSchemaByVersion for task state validation of the framework fields Aug 25, 2023
Copy link
Contributor

@e40pud e40pud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DE changes LGTM

@mikecote
Copy link
Contributor Author

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
alerting 104 96 -8

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/alerting-state-types 15 22 +7
alerting 755 749 -6
total +1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
triggersActionsUi 1.5MB 1.4MB -98.0KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
alerting 34.1KB 21.4KB -12.7KB
triggersActionsUi 89.3KB 89.3KB -2.0B
total -12.7KB
Unknown metric groups

API count

id before after diff
@kbn/alerting-state-types 16 23 +7
alerting 786 780 -6
total +1

async chunk count

id before after diff
triggersActionsUi 63 61 -2

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @mikecote

Copy link
Contributor

@klacabane klacabane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Infra changes LGTM

@mikecote mikecote merged commit 82f551c into elastic:main Aug 29, 2023
34 checks passed
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Aug 29, 2023
bryce-b pushed a commit to bryce-b/kibana that referenced this pull request Sep 19, 2023
…idation of the framework fields (elastic#162425)

Resolves elastic#159343

In this PR, I'm preparing the alerting rule task types for serverless by
defining an explicit task state schema for the framework level fields.
This schema is used to validate the task's state before saving but also
when reading. In the scenario an older Kibana node runs a task after a
newer Kibana node has stored additional task state, the unknown state
properties will be dropped. Additionally, this will prompt developers to
be aware that adding required fields to the task state is a breaking
change that must be handled with care. (see
elastic#155764).

The PR also includes the following changes:
- Modifying the `@kbn/alerting-state-types` package so the types are
generated from `config-schema`, instead of `io-ts`. The schema is
re-used for exporting a new `stateSchemaByVersion` property.
- Removing `DateFromString` in favour of strings everywhere
(config-schema doesn't support this conversion)
- Add a v1 `up` migration that will ensure the types match the
TypeScript interface on any existing alerting task. The migration
assumes any type of data could exist and dropping parts that don't match
the type expectation. The TypeScript interface uses `schema.maybe()` in
a lot of places (as `io-ts` did), so safe if ever data gets dropped.
- Cleanup the `alerting/common/**` exports to reduce bundle size.
Because the `@kbn/alerting-state-types` package grew.
- Since the new TypeScript interfaces / types are `ReadOnly<...>`, I
created some `Mutable...` types for places that needed it (in order to
avoid code refactoring).

## To verify

Stack Monitoring:
- To make TypeScript happy with the new ReadOnly `RawAlertInstance`
type, I removed some redundant code and solved the issue.

Security Solution:
- Changes to the `alertInstanceFactoryStub` set the alert's date to a
`string` instead of a `Date` value. Note: The HTTP API response
converted `Date` objects to `string`, so the HTTP API response will look
the same with this change.

Response Ops:
- In a fresh Kibana install, create alerting rules and ensure they run.
- In a 8.9 version, create some rules, upgrade to this branch and ensure
they still run.
- Compare the `io-ts` definition with the new `config-schema`. They
should match 1:1.
- Look for ways the migration code could fail.

Note: The main changes are within the following areas:
- `x-pack/plugins/alerting/server/rule_type_registry.ts`
- `x-pack/packages/kbn-alerting-state-types/*`

---------

Co-authored-by: kibanamachine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Alerting Feature:Task Manager release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Task Manager] Onboard alerting task types to have a versioned state schema for the framework fields
7 participants