Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution][Detections] Adds 'Alerts Detected' and 'Alerts Created' metrics to Rule Monitoring table #126210

Closed

Conversation

spong
Copy link
Member

@spong spong commented Feb 22, 2022

Summary

Resolves #120678, #120668

Broken out from #124198, this PR adds total_alerts_created and total_alerted_detected metrics (naming discussion) both to the event-log and to the siem-detection-engine-rule-execution-info SO and exposes them as columns on the Rule Monitoring table.

Todo:

  • FTR tests exercising Total Alerts & Total Hits for each rule executor (event-log + logger)

Checklist

Delete any items that are not applicable to this PR.

  • Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
  • Documentation was added for features that require explanation or tutorials
    • Doc Screenshots will need to be updated. No additional dedicated docs should be needed as descriptive tooltips have been included
  • Unit or functional tests were updated or added to match the most common scenarios

@spong spong self-assigned this Feb 23, 2022
@spong spong added bug Fixes for quality problems that affect the customer experience enhancement New value added to drive a business result Feature:Rule Monitoring Security Solution Detection Rule Monitoring Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team v8.2.0 release_note:fix labels Feb 23, 2022
@spong spong added v8.3.0 and removed v8.2.0 labels Mar 23, 2022
spong added a commit that referenced this pull request Mar 28, 2022
## Summary

Resolves #119598, #119599, #101014

Test plan ([internal doc](https://docs.google.com/document/d/1-prIUGYaPHiwGA79CgSdw1926lxIPKGWWkYOUD2BM1U/edit#heading=h.womzsfdt6zt8))

Adds `Rule Execution Log` table to Rule Details page:

<p align="center">
  <img width="700" src="https://user-images.githubusercontent.com/2946766/158540840-e9cddb9b-f33d-4b95-86ad-cb3e0a00cf39.gif" />
</p>


### Implementation notes

The useful metrics within `event-log` for a given rule execution are spread between a few different platform (`execute-start`, `execute`) and security (`execution-metrics`, `status-change`) events. In effort to provide consolidated metrics per rule execution (and avoiding a lot of empty cells and mis-matched statuses like in the image below)

<p align="center">
  <img width="700" src="https://user-images.githubusercontent.com/2946766/151933881-2e58f4d7-4cda-4528-9d44-37cb7bd5de9c.png" />
</p>



these rule execution events are aggregated by their `executionId`, and then fields are merged from each different event. This PR was re-worked to take advantage of the new event-log aggregation support added in #126948, and is no longer implemented as an in-memory aggregation server side.

* Due to restrictions around supplying search filters that may match multiple sub-agg buckets and missing data ([see discussion here](https://github.com/elastic/kibana/pull/127339/files#r825240516)), it was decided that we'd disable the search bar for the time being. We have both a near-term (writing single rollup event) and long-term (ES|QL) solution that will allow us to re-enable this functionality.

* Note, since a `terms` agg is used to fetch all execution events, an upper bound must be set. See [this discussion](https://github.com/elastic/kibana/pull/127339/files#r823035420) for more details, but setting this max to `1000` events for the time being, and returning total cardinality of execution events back within `total` to allow the UI to inform the user that they should narrow their search further to better isolate and find possible issues. This should be a be a reasonable constraint for most all rules as a rule executing every 5 minutes, 1000 executions would cover over 3 days of execution time.

<p align="center">
  <img width="700" src="https://user-images.githubusercontent.com/2946766/159045563-966896b4-3cd1-475d-9f0e-c2d300683546.png" />
</p>


The `Filter for alerts` action will be available on all `Succeeded`/`Partial Failure` executions even if there weren't alerts generated until #126210 is merged and we can start returning the alert count, at which point we can programmatically enabled/disable this action based on alert count.



<p align="center">
  <img width="300" src="https://user-images.githubusercontent.com/2946766/159051762-e2f97ba4-4ce1-4f67-8ae1-395e4b191cab.png" />
</p>
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
securitySolution 4.8MB 4.8MB +1.6KB

Saved Objects .kibana field count

Every field in each saved object type adds overhead to Elasticsearch. Kibana needs to keep the total field count below Elasticsearch's default limit of 1000 fields. Only specify field mappings for the fields you wish to search on or query. See https://www.elastic.co/guide/en/kibana/master/development-plugin-saved-objects.html#_mappings

id before after diff
siem-detection-engine-rule-execution-info 10 12 +2
Unknown metric groups

ESLint disabled line counts

id before after diff
securitySolution 442 443 +1

Total ESLint disabled count

id before after diff
securitySolution 511 512 +1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @spong

@banderror banderror removed the v8.3.0 label Jun 10, 2022
@banderror
Copy link
Contributor

I'm thinking we should probably wait until #135127 is resolved and we migrate our rule statuses and metrics to the Framework (#130966). Then we could add new metrics based on the new mechanism. @spong what do you think?

@spong
Copy link
Member Author

spong commented Jun 28, 2022

Yeah @banderror, I was thinking the same thing 🙂. Will be better/easier to just pass this data back as part of the overall status metrics at the end of the execution. I'll comment over on #130966 referencing this as and example for adding alerts detected/created, and then close out this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience enhancement New value added to drive a business result Feature:Rule Monitoring Security Solution Detection Rule Monitoring release_note:fix Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
Projects
None yet
3 participants