[Alerting] POC for stack rules to use rule registry #98319

ymao1 · 2021-04-26T15:39:07Z

With the merging of the rules registry plugin, Alerting would like to explore how stack rules might work with the rules registry.

ymao1 · 2021-04-26T18:52:06Z

General thoughts after creating a POC for writing out alerts as data for stack rules using the rule registry:

Solution level vs rule type level
The rule registry seems to make some assumptions that alerts-as-data will be written out at a solution-level. Each solution (o11y, security) creates its own rule registry during plugin setup, which bootstraps an alerts-as-data index for that solution (.alerts-observability*, alerts-security*). This mostly works because the types of rules and the types of alerts written out are fairly consistent across these two solution. This assumption doesn't work quite as well if we consider "Stack Rules" to be a solution and register a single rule registry for all stack rules. There is no guarantee or requirement that stack rules are consistent and all have similar workflows and write similar types of data. (Tracking containment within stack rules is a good example of an outlier.) With the current rule registry, we could circumvent this by creating multiple different rule registries (one for each stack rule?) during plugin setup, but that leads to the question of whether this framework-level assumption of grouping by solution is necessary?

Cross-solution access to alerts
As discussed above, when rule registries are created, indices to hold the alerts are bootstrapped. This means there is a separate alerts as data index for observability, security and (with this POC) stack-rules. Then each ruleRegistryClient scopes all its requests to the specific index it bootstrapped. This works great when only security alerts are shown within security or only observability alerts are shown within observability, but it gets more complicated when we think about using a stack rule within security or observability (currently not done, but it is possible with Alerting's producer/consumer access model). A user might be able to create an ES query stack rule within security but security's scoped rule registry would not ever retrieve the data because it lives in .alerts-stack-rules* which it does not have access to.

Rule type factories
This is preliminary until we can determine how much rules can reuse these "rule type factories" but if it ends up that each solution or rule type is creating its own "ruleTypeFactory", then it might make sense to instead move some of the factory functionality down to the solution instead of maintaining them at a framework level.

Framework vs library
I believe there's already been some chatter in this area but I like the idea of creating a library of well-tested helper functions vs a full-fledged framework that the alerting framework can then incrementally migrate.

ymao1 · 2021-04-26T19:05:45Z

Alerting framework possible improvements

The lifecycleRuleTypeFactory repeats a lot of the logic that occurs within the alerting task runner wrt to determining whether an alert is active/new/recovered, with the nice addition of grouping a series of consecutive active alerts with a UUID (and determining duration), Knowing that security is also creating some ruleTypeFactories for their rule types POC here and knowing that security rules have a different lifecycle than observability rules, it will be interesting to see how many rule registry executors reuse logic from the alerting task runner vs implementing their own (different) logic. It's possible that the lifecycle determinations we're doing in alerting is too specific to a single type of rule (one with a distinct lifecycle) and we should be making it more generic? A complicating factor is that the alerting task runner functionality is heavily coupled with the event log

elasticmachine · 2021-04-26T19:27:26Z

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

dgieselaar · 2021-04-28T08:24:38Z

General thoughts after creating a POC for writing out alerts as data for stack rules using the rule registry:

Solution level vs rule type level

The rule registry seems to make some assumptions that alerts-as-data will be written out at a solution-level. Each solution (o11y, security) creates its own rule registry during plugin setup, which bootstraps an alerts-as-data index for that solution (.alerts-observability*, alerts-security*). This mostly works because the types of rules and the types of alerts written out are fairly consistent across these two solution. This assumption doesn't work quite as well if we consider "Stack Rules" to be a solution and register a single rule registry for all stack rules. There is no guarantee or requirement that stack rules are consistent and all have similar workflows and write similar types of data. (Tracking containment within stack rules is a good example of an outlier.) With the current rule registry, we could circumvent this by creating multiple different rule registries (one for each stack rule?) during plugin setup, but that leads to the question of whether this framework-level assumption of grouping by solution is necessary?

Cross-solution access to alerts
As discussed above, when rule registries are created, indices to hold the alerts are bootstrapped. This means there is a separate alerts as data index for observability, security and (with this POC) stack-rules. Then each ruleRegistryClient scopes all its requests to the specific index it bootstrapped. This works great when only security alerts are shown within security or only observability alerts are shown within observability, but it gets more complicated when we think about using a stack rule within security or observability (currently not done, but it is possible with Alerting's producer/consumer access model). A user might be able to create an ES query stack rule within security but security's scoped rule registry would not ever retrieve the data because it lives in .alerts-stack-rules* which it does not have access to.

The way I see it, technical fields (e.g. alert id, uuid, rule id, duration, threshold, value, building block, etc) should be in the common schema. In some cases, solutions know the shape of the source data, so they can add mappings where they think it'll be useful. Mapped fields are easier to work with than runtime fields. For stack rules, but also for some solution rule types, we don't know upfront what the shape of the data is. I don't see another solution there currently but to use runtime fields.

Ideally users can configure write targets at some point (e.g. write alert data from this rule into the security solution index, or into my own index), but that is not something we can easily do (RBAC, potential mapping conflicts).

Rule type factories
This is preliminary until we can determine how much rules can reuse these "rule type factories" but if it ends up that each solution or rule type is creating its own "ruleTypeFactory", then it might make sense to instead move some of the factory functionality down to the solution instead of maintaining them at a framework level.

Framework vs library
I believe there's already been some chatter in this area but I like the idea of creating a library of well-tested helper functions vs a full-fledged framework that the alerting framework can then incrementally migrate.

Totally agree, hope we can figure out over the next few weeks what utilities should be shared and what is better off being handled by specific teams.

ymao1 · 2021-07-01T18:12:24Z

Closing as POC for rules registry V1 is complete. Will open new issue for actually migrating stack rules to the rule data service.

ymao1 mentioned this issue Apr 26, 2021

[Alerting] Stack Rules on Rule Registry POC #96966

Closed

botelastic bot added the needs-team Issues missing a team label label Apr 26, 2021

ymao1 added Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Theme: rac label obsolete labels Apr 26, 2021

botelastic bot removed the needs-team Issues missing a team label label Apr 26, 2021

ymao1 self-assigned this Apr 26, 2021

ymao1 closed this as completed Jul 1, 2021

kobelb added the needs-team Issues missing a team label label Jan 31, 2022

botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Alerting] POC for stack rules to use rule registry #98319

[Alerting] POC for stack rules to use rule registry #98319

ymao1 commented Apr 26, 2021

ymao1 commented Apr 26, 2021

ymao1 commented Apr 26, 2021

elasticmachine commented Apr 26, 2021

dgieselaar commented Apr 28, 2021

ymao1 commented Jul 1, 2021

[Alerting] POC for stack rules to use rule registry #98319

[Alerting] POC for stack rules to use rule registry #98319

Comments

ymao1 commented Apr 26, 2021

ymao1 commented Apr 26, 2021

ymao1 commented Apr 26, 2021

elasticmachine commented Apr 26, 2021

dgieselaar commented Apr 28, 2021

ymao1 commented Jul 1, 2021