Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security_Solution][Endpoint] Leveraging msearch and ancestry array for resolver #70134

Merged

Conversation

jonathan-buttner
Copy link
Contributor

@jonathan-buttner jonathan-buttner commented Jun 26, 2020

This PR leverages msearch and the ancestry array to improve the resolver backend's performance. It relies on the generator changes in this PR: #70129

Notable changes:

  • Related alerts and related events no longer use aggregations, if ES doesn't return anymore data we know we've reached the end
  • The usage of the nextChild cursors for the children is different since we're leveraging the ancestry array. I've tried to outline the different scenarios in the code's comments
  • Removed generations since it can't be enforcing while using the ancestry array. The limit is instead base don the number of nodes you'd like to receive
  • Moved most of the fetcher's logic into individual classes so it can be used for msearch

image

@jonathan-buttner jonathan-buttner added v8.0.0 release_note:skip Skip the PR/issue when compiling release notes Team:Endpoint Data Visibility Team managing the endpoint resolver Feature:Endpoint Elastic Endpoint feature v7.9.0 labels Jun 26, 2020
@jonathan-buttner jonathan-buttner requested review from a team as code owners June 26, 2020 22:10
@elasticmachine
Copy link
Contributor

Pinging @elastic/endpoint-data-visibility-team (Team:Endpoint Data Visibility)

@elasticmachine
Copy link
Contributor

Pinging @elastic/endpoint-app-team (Feature:Endpoint)

@@ -113,6 +113,8 @@ describe('data generator', () => {
percentWithRelated: 100,
relatedEvents: 0,
relatedAlerts: 0,
useAncestryArray: true,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are in this PR: #70129

they will be removed once the other PR gets merged.

@@ -17,6 +17,7 @@ import {
EndpointStatus,
} from './types';
import { factory as policyFactory } from './models/policy_config';
import { parentEntityId } from './models/event';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are in this PR: #70129

they will be removed once the other PR gets merged.

@@ -17,7 +17,7 @@ import { MSearchQuery } from './multi_searcher';
* @param T the structured return type of a resolver query. This represents the type that is returned when translating
* Elasticsearch's SearchResponse<ResolverEvent> response.
*/
export abstract class ResolverQuery<T> implements MSearchQuery {
export abstract class ResolverQuery<T, R = ResolverEvent> implements MSearchQuery {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The R here sets the stage for being able to limit the _source of a queries response. This will be in a follow up PR. Limit the source will be a performance improvement for querying for the start events for the children.

formatResponse(response: SearchResponse<ResolverEvent>): PaginatedResults {
return {
results: ResolverQuery.getResults(response),
totals: PaginationBuilder.getTotals(response.aggregations),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totals are inferred based on the response of ES while using the ancestry array.

/**
* a function to handle the response
*/
handler: (response: SearchResponse<ResolverEvent>) => void;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This calls into the class (e.g. RelatedEventsQueryHandler etc) for it to handle the response from ES.

const { body }: { body: ResolverChildren } = await supertest
.get(`/api/endpoint/resolver/${tree.origin.id}/children?generations=1`)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generator creates nodes in depth first orderish. I don't want to have to rely on that though in the test. To get around that I move to a level of the tree where I know there aren't any other descendants before the immediate children.

@@ -16,6 +16,7 @@ import {
LegacyEndpointEvent,
ResolverNodeStats,
ResolverRelatedAlerts,
ResolverChildNode,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO need to write more tests for the children's pagination.

* 1. At the time of querying it could have no children in ES, in which case it will be marked as
* null because we know it does not have children during this query.
* 2. If the max level was reached we do not know if this node has children or not so we'll mark it as null
* nextChild can have 3 different states:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought, I understand what this is saying, especially after reading through your write up, but one of @oatkiller's famous diagrams may be useful here, you could probs just copy paste one he's already done. Not necessary, but I think it could be helpful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @michaelolo24 do you mean an ascii diagram of a tree? Or a table?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ascii I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what I'll probably do is put a link to the relevant section in the docs that I'll be adding as a follow up or add on to this PR.


return {
query: this.query,
ids: this.entityID,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the msearch query take strings and arrays or just strings? I know in your constructor it's looking for a string, I'm just confused by the pluralized id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's both string | string[] https://github.com/jonathan-buttner/kibana/blob/msearch-ancestry-array/x-pack/plugins/security_solution/server/endpoint/routes/resolver/queries/multi_searcher.ts#L36

And some of the other *_query_handler files pass it string[] particularly the children one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed that, thanks!

/**
* Get the results after an msearch.
*/
getResults() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only used internally to the class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// bucket the start and end events together for a single node
const ancestryNodes = this.toMapOfNodes(results);

// the order of this array is going to be weird, it will look like this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary, but I feel like a mini diagram could be helpful here too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add the diagram in a follow up PR so I don't have to wait for ci again haha

return children;
}

function getStartEventsFromLevels(levels: Array<Map<string, TreeNode>>) {
Copy link
Contributor

@michaelolo24 michaelolo24 Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder what the performance here is gonna look like at scale 😅.

Copy link
Contributor Author

@jonathan-buttner jonathan-buttner Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a test file :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that'll do it 😂

@jonathan-buttner
Copy link
Contributor Author

@elasticmachine merge upstream

@elasticmachine
Copy link
Contributor

merge conflict between base and head

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Build metrics

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@jonathan-buttner jonathan-buttner merged commit c081caa into elastic:master Jul 2, 2020
@jonathan-buttner jonathan-buttner deleted the msearch-ancestry-array branch July 2, 2020 12:47
jonathan-buttner added a commit that referenced this pull request Jul 2, 2020
…or resolver (#70134) (#70583)

* Refactor generator for ancestry support

* Adding optional ancestry array

* Refactor the pagination since the totals are not used anymore

* Updating the queries to not use aggregations for determining the totals

* Refactoring the children helper to handle pagination without totals

* Pinning the seed for the resolver tree generator service

* Splitting the fetcher into multiple classes for msearch

* Updating tests and api for ancestry array and msearch

* Adding more comments and fixing type errors

* Fixing resolver test import

* Fixing tests and type errors

* Fixing type errors and tests

* Removing useAncestry field

* Fixing test

* Removing useAncestry field from tests

* An empty array will be returned because that's how ES will do it too
gmmorris added a commit to gmmorris/kibana that referenced this pull request Jul 2, 2020
* master:
  Update known-plugins.asciidoc (elastic#69370)
  [ML] Anomaly Explorer swim lane pagination (elastic#70063)
  [Ingest Manager] Update asset paths to use _ instead of - (elastic#70320)
  Fix discover, tsvb and Lens chart theming issues (elastic#69695)
  Allow Saved Object type mappings to set a field's `doc_values` property (elastic#70433)
  [S&R] Support data streams (elastic#68078)
  [Maps] Add styling and tooltip support to mapbox mvt vector tile sources (elastic#64488)
  redirect to default app if hash can not be forwarded (elastic#70417)
  [APM] Don't fetch dynamic index pattern in setupRequest (elastic#70308)
  [Security_Solution][Endpoint] Leveraging msearch and ancestry array for resolver (elastic#70134)
  Update docs for api authentication usage (elastic#66819)
  chore(NA): disable alerts_detection_rules cypress suites (elastic#70577)
  add getVisibleTypes API to SO type registry (elastic#70559)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Endpoint Elastic Endpoint feature release_note:skip Skip the PR/issue when compiling release notes Team:Endpoint Data Visibility Team managing the endpoint resolver v7.9.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants