-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Onboard alerting tasks to use stateSchemaByVersion for task state validation of the framework fields #162425
Onboard alerting tasks to use stateSchemaByVersion for task state validation of the framework fields #162425
Conversation
…-ref HEAD~1..HEAD --fix'
….com:mikecote/kibana into alerting/onboard-task-types-state-validation
…d-task-types-state-validation
…d-task-types-state-validation
…d-task-types-state-validation
…d-task-types-state-validation
.then((state: RuleTaskState) => { | ||
return pipe( | ||
ruleStateSchema.decode(state), | ||
fold((e: Errors) => { | ||
throw new Error(`Rule "${ruleId}" has invalid state`); | ||
}, identity) | ||
); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer necessary because:
- The validation happens on task save
- The decode would convert string dates into date objects. But would get converted back into strings when stringifying the JSON response.
- Stack Monitoring who is using this API uses
RuleTaskState
to be type safe.
@@ -73,7 +74,7 @@ describe('successful migrations for 8.8.0', () => { | |||
}); | |||
|
|||
function checkMetaInRuleTaskState( | |||
actual: RuleTaskState, | |||
actual: MutableRuleTaskState, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To support code like this without having to rewrite.
delete copy?.alertInstances?.[id].meta?.uuid;
@@ -253,7 +254,7 @@ function addAlertUUID(doc: SavedObjectUnsanitizedDoc<SerializedConcreteTaskInsta | |||
|
|||
// mutates alerts passed in | |||
function addAlertUUIDsToAlerts( | |||
alerts: RuleTaskState['alertInstances'] | undefined, | |||
alerts: MutableRuleTaskState['alertInstances'] | undefined, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To support code like this without having to rewrite
alert.meta.uuid = trackedAlert.alertUuid;
if (filteredAlertInstance.state) { | ||
accum[instanceId].state = { | ||
alertStates: (filteredAlertInstance.state as AlertInstanceState).alertStates, | ||
}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RawAlertInstance
is now a ReadOnly type, since this code was doing a mutation, TypeScript was complaining. This code felt like it wasn't doing anything so I decided to remove it instead of fixing it..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like state
could have multiple properties set and this branch ensures we only keep alertStates
, maybe to reduce payload sent to clients. Do you know if the state object can become big enough to be problematic ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh ok, that would make sense, thanks! I will keep this code around, at a first glance I wasn't sure what it was doing but it's clear now.
Do you know if the state object can become big enough to be problematic ?
It's possible for sure, there is a circuit breaker at the framework level to prevent more than 1,000 alert objects (and state objects) by default, so it is capped at some point but we haven't seen many issues around the storage size yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klacabane I added the code back in base_rule.ts
. I did it in a way that doesn't mutate the accum[instanceId]
object to keep TypeScript happy (since RawAlertInstance
is now a ReadOnly
type). Let me know if this looks good to you 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great thanks!
ruleStateSchema.decode(taskInstance.state), | ||
fold((e: t.Errors) => { | ||
throw new Error( | ||
`Task "${taskInstance.id}" ${ | ||
alert ? `(underlying Alert "${alert.id}") ` : '' | ||
}has invalid state at ${enumerateErrorFields(e)}` | ||
); | ||
}, t.identity) | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic moved to task manager on read of the state object. This code also used to convert date strings into dates which we now type them as strings all over, so this is no longer needed.
@@ -52,7 +52,7 @@ export class Alert< | |||
ActionGroupIds extends string = never | |||
> { | |||
private scheduledExecutionOptions?: ScheduledExecutionOptions<State, Context, ActionGroupIds>; | |||
private meta: AlertInstanceMeta; | |||
private meta: MutableAlertInstanceMeta; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file does a bunch of mutations of the meta object, it felt best to continue that path for now instead of trying to fix it.
Pinging @elastic/response-ops (Team:ResponseOps) |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Verified that I was able to upgrade from earlier version (8.8) and see my detection/obs/stack rules with actions continue running and that I was able to create new rules that successfully ran.
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DE changes LGTM
@elasticmachine merge upstream |
💚 Build Succeeded
Metrics [docs]Module Count
Public APIs missing comments
Async chunks
Page load bundle
Unknown metric groupsAPI count
async chunk count
History
To update your PR or re-run it, just comment with: cc @mikecote |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Infra changes LGTM
…idation of the framework fields (elastic#162425) Resolves elastic#159343 In this PR, I'm preparing the alerting rule task types for serverless by defining an explicit task state schema for the framework level fields. This schema is used to validate the task's state before saving but also when reading. In the scenario an older Kibana node runs a task after a newer Kibana node has stored additional task state, the unknown state properties will be dropped. Additionally, this will prompt developers to be aware that adding required fields to the task state is a breaking change that must be handled with care. (see elastic#155764). The PR also includes the following changes: - Modifying the `@kbn/alerting-state-types` package so the types are generated from `config-schema`, instead of `io-ts`. The schema is re-used for exporting a new `stateSchemaByVersion` property. - Removing `DateFromString` in favour of strings everywhere (config-schema doesn't support this conversion) - Add a v1 `up` migration that will ensure the types match the TypeScript interface on any existing alerting task. The migration assumes any type of data could exist and dropping parts that don't match the type expectation. The TypeScript interface uses `schema.maybe()` in a lot of places (as `io-ts` did), so safe if ever data gets dropped. - Cleanup the `alerting/common/**` exports to reduce bundle size. Because the `@kbn/alerting-state-types` package grew. - Since the new TypeScript interfaces / types are `ReadOnly<...>`, I created some `Mutable...` types for places that needed it (in order to avoid code refactoring). ## To verify Stack Monitoring: - To make TypeScript happy with the new ReadOnly `RawAlertInstance` type, I removed some redundant code and solved the issue. Security Solution: - Changes to the `alertInstanceFactoryStub` set the alert's date to a `string` instead of a `Date` value. Note: The HTTP API response converted `Date` objects to `string`, so the HTTP API response will look the same with this change. Response Ops: - In a fresh Kibana install, create alerting rules and ensure they run. - In a 8.9 version, create some rules, upgrade to this branch and ensure they still run. - Compare the `io-ts` definition with the new `config-schema`. They should match 1:1. - Look for ways the migration code could fail. Note: The main changes are within the following areas: - `x-pack/plugins/alerting/server/rule_type_registry.ts` - `x-pack/packages/kbn-alerting-state-types/*` --------- Co-authored-by: kibanamachine <[email protected]>
Resolves #159343
In this PR, I'm preparing the alerting rule task types for serverless by defining an explicit task state schema for the framework level fields. This schema is used to validate the task's state before saving but also when reading. In the scenario an older Kibana node runs a task after a newer Kibana node has stored additional task state, the unknown state properties will be dropped. Additionally, this will prompt developers to be aware that adding required fields to the task state is a breaking change that must be handled with care. (see #155764).
The PR also includes the following changes:
@kbn/alerting-state-types
package so the types are generated fromconfig-schema
, instead ofio-ts
. The schema is re-used for exporting a newstateSchemaByVersion
property.DateFromString
in favour of strings everywhere (config-schema doesn't support this conversion)up
migration that will ensure the types match the TypeScript interface on any existing alerting task. The migration assumes any type of data could exist and dropping parts that don't match the type expectation. The TypeScript interface usesschema.maybe()
in a lot of places (asio-ts
did), so safe if ever data gets dropped.alerting/common/**
exports to reduce bundle size. Because the@kbn/alerting-state-types
package grew.ReadOnly<...>
, I created someMutable...
types for places that needed it (in order to avoid code refactoring).To verify
Stack Monitoring:
RawAlertInstance
type, I removed some redundant code and solved the issue.Security Solution:
alertInstanceFactoryStub
set the alert's date to astring
instead of aDate
value. Note: The HTTP API response convertedDate
objects tostring
, so the HTTP API response will look the same with this change.Response Ops:
io-ts
definition with the newconfig-schema
. They should match 1:1.Note: The main changes are within the following areas:
x-pack/plugins/alerting/server/rule_type_registry.ts
x-pack/packages/kbn-alerting-state-types/*