Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GS] SavedObject results provider #65222

Closed
pgayvallet opened this issue May 5, 2020 · 17 comments · Fixed by #68619
Closed

[GS] SavedObject results provider #65222

pgayvallet opened this issue May 5, 2020 · 17 comments · Fixed by #68619
Assignees
Labels
discuss NeededFor:Core UI REASSIGN from Team:Core UI Deprecated label for old Core UI team Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@pgayvallet
Copy link
Contributor

This issue is to discuss the concrete implementation of the SavedObject result provider needed for the GlobalSearch MVP: #58049

Related issues:

From #58049, the types of SO we would need to return would be:

  • Dashboards
  • Visualizations
  • Saved searches
  • Canvas workpads
  • Maps
  • Graphs
  • ML jobs (future)
  • SIEM Timelines

Questions:

How exactly will we perform the query / which fields should we search by

It's still unclear which fields we should search by. The base SavedObject type do not even have a title field in the base implementation, and I'm not sure all types actually declare one in their mappings.

Second point, when calling for example gs.find('dash'), should we search on the type field and return all dashboard objects? The type field being a keyword, is this doable?

Is our current find API sufficient, or will we need to create a new one

Directly depends on the first point. I feel like we will need to add a new API to answer the GS needs, even if ideally this would be avoided.

Is it technically possible to retrieve all SO in a single query

For performances, it would be best if we could retrieve all types of saved objects in a single query, instead of performing x (= nb types) queries. Would this be a possibility?

@pgayvallet pgayvallet added discuss Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc REASSIGN from Team:Core UI Deprecated label for old Core UI team labels May 5, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-platform (Team:Platform)

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core-ui (Team:Core UI)

@pgayvallet
Copy link
Contributor Author

pgayvallet commented May 5, 2020

@rudolf
Copy link
Contributor

rudolf commented May 6, 2020

How exactly will we perform the query / which fields should we search by
It's still unclear which fields we should search by. The base SavedObject type do not even have a title field in the base implementation, and I'm not sure all types actually declare one in their mappings.

Is our current find API sufficient, or will we need to create a new one
Directly depends on the first point. I feel like we will need to add a new API to answer the GS needs, even if ideally this would be avoided.

Is it technically possible to retrieve all SO in a single query
For performances, it would be best if we could retrieve all types of saved objects in a single query, instead of performing x (= nb types) queries. Would this be a possibility?

I don't think it makes sense for all saved objects to have a title, so it feels like it's better to keep this in the type attributes than to add a title field to the root property of all saved objects.

I think we should start with the Saved Objects Management plugin's search abilities as a baseline. It allows plugins to specify a defaultSearchField and searches across all types on these search fields using the SavedObjectsClient#find API:

importAndExportableTypes.forEach(type => {
const searchField = managementService.getDefaultSearchField(type);
if (searchField) {
searchFields.add(searchField);
}
});
const findResponse = await client.find<any>({
...req.query,
fields: undefined,
searchFields: [...searchFields],
});

I know of #42316, but other than that is this existing search functionality sufficient for searching across Saved Objects or do we need something more powerful?

More generally, SavedObjectsClient#find (and the implementation in SavedObjectsRepository) allows for searching across multiple types but only in a single namespace. I'm assuming as a first step we'll only search in a user's current namespace, but I it might be useful to know that there's a dashboard with a very high score that matches in another namespace that you have access to instead of having to manually switch between all my spaces to try to locate it. But if we need to support this we could enumerate the user's spaces and construct a find call for each space.

Second point, when calling for example gs.find('dash'), should we search on the type field and return all dashboard objects? The type field being a keyword, is this doable?

To let ES search on it we will have to change the type field mappings which will trigger a migration. Since we know all the saved object types before hand we could do a client-side search and if a type matches search on the full type name. This makes it harder to combine search queries like "finance dashboard" which we might want to give a high score because it matches the type and title of {title: 'finance', type: 'dashboard'}.

@kobelb
Copy link
Contributor

kobelb commented May 6, 2020

I agree that starting with the way that saved-objects management does search as a baseline is reasonable. It definitely leaves something to be desired, but it's tolerable.

The SavedObjectsClient#find does support KQL, which gives us more flexibility. It seems like we could allow every saved-object type to construct their own KQL statement for their own type, which is then OR'ed together. I honestly haven't tried this, so there might be some limitations that I'm glossing over.

Restricting the searches to the current namespace for the initial implementation is completely fine, we previously discussed that limitation and everyone seemed onboard. I believe that @legrego is looking into how we'd relax this restriction for another use-case.

You can still do searches on keyword fields in Elasticsearch, they just don't go through analysis. This means that you generally need an exact match, which isn't great. Adding a text field would greatly improve this behavior.

@legrego
Copy link
Member

legrego commented May 6, 2020

Restricting the searches to the current namespace for the initial implementation is completely fine, we previously discussed that limitation and everyone seemed onboard. I believe that @legrego is looking into how we'd relax this restriction for another use-case.

Yep, I'm looking into this whenever I get free cycles: partially for GS, and partially for the IM/Fleet project. I have a very rough POC working, but getting it tested and production ready is another task altogether

@pgayvallet
Copy link
Contributor Author

I think we should start with the Saved Objects Management plugin's search abilities as a baseline. It allows plugins to specify a defaultSearchField and searches across all types on these search fields using the SavedObjectsClient#find API

That seems like a good idea. Either leveraging this API directly, or construct our own based on the existing defaultSearchField property (that is even directly exposed by core)

More generally, SavedObjectsClient#find (and the implementation in SavedObjectsRepository) allows for searching across multiple types but only in a single namespace. I'm assuming as a first step we'll only search in a user's current namespace

Yea, as @kobelb already said, this is totally fine for V1.

To let ES search on it we will have to change the type field mappings which will trigger a migration. Since we know all the saved object types before hand we could do a client-side search and if a type matches search on the full type name
You can still do searches on keyword fields in Elasticsearch, they just don't go through analysis. This means that you generally need an exact match, which isn't great.

Yea it's not great. @rudolf do you see any impact on changing the type field mapping appart from the triggered migration?

@ryankeairns functionally, can we get you opinion on:

when calling for example gs.find('dash'), should we search on the type field and return all dashboard objects?

@pgayvallet pgayvallet self-assigned this May 7, 2020
@joshdover joshdover mentioned this issue May 7, 2020
4 tasks
@rudolf
Copy link
Contributor

rudolf commented May 7, 2020

Yea it's not great. @rudolf do you see any impact on changing the type field mapping appart from the triggered migration?

I don't think it would cause any issues. It's not optimal for performance but we can use a multi field to index it as both keyword and text if that's a problem. We should probably do a benchmark with 100K objects so that we don't have do to another migration if this does end up impacting performance when Saved Objects support much more documents.

@kobelb
Copy link
Contributor

kobelb commented May 7, 2020

I think we should use a multi-field for the type. Changing it to just be text will cause the analyzer to tokenize strings like "something-awesome" into two terms: "something" and "awesome". This will then make it difficult to write queries to only return saved-objects with the type of "something-awesome".

@ryankeairns
Copy link
Contributor

ryankeairns commented May 7, 2020

when calling for example gs.find('dash'), should we search on the type field and return all dashboard objects?

In my mind, I would expect gs.find('dash') to return the Dashboard application, but returning all the dashboards 1) feels excessive for performance reasons and 2) would be so broad that it would be difficult to locate a specific dashboard.

If users want to see all their dashboards, then they should just navigate to the Dashboard app as they are all listed (and searchable) on the Dashboard home page.

@pgayvallet
Copy link
Contributor Author

In my mind, I would expect gs.find('dash') to return the Dashboard application, but returning all the dashboards 1) feels excessive for performance reasons and 2) would be so broad that it would be difficult to locate a specific dashboard.

SGTM functionally, and it's also a good new for the technical implementation as it means we don't need to introduce the multi-field for type.

@pgayvallet
Copy link
Contributor Author

After a slack discussion with @spalger about #68550, it seems that even if we were to add the _score property somewhere in the find response from the SO repository, this scoring wouldn't be of much help, as there isn't any way to normalize it to the 1-100 range format expected by the globalSearch API.

One option would be to rely on the same algorithm I used in #68488 (exact match + levenshtein distance) between the search term and the type's defaultSearchField. This is far from a perfect solution for a lot of reasons, but I can't really think of anything more solid without implementing our own scoring mechanism, which is probably a harder task than the GS API on its own.

One alternative would be to just return the plain _score value itself, but I don't think this would really be usable in any way, and would also be inconsistent with the expected normalized score from the GS API.

Do anyone see a better option?

@rudolf
Copy link
Contributor

rudolf commented Jun 9, 2020

Even if we could normalize the score, normalizing scores so that an application name match will optimally interleave with a saved object title match is a fuzzy problem, it's ultimately a decision we need to make.

Maybe if providers limit their results to a sensible amount and we had a fixed priority we could display e.g. the top 3 application results, followed by the top 10 saved objects. So we don't try to interleave results, but instead decide beforehand how much "real estate" we want to give a certain provider.

@pgayvallet
Copy link
Contributor Author

pgayvallet commented Jun 9, 2020

Maybe if providers limit their results to a sensible amount and we had a fixed priority we could display e.g. the top 3 application results, followed by the top 10 saved object

In practice, the searchbar is going to perform it's own sorting/ordering by displaying the apps before the other results anyway. And it's very unlikely that more than 2 or 3 apps got hit for a given term, so it's virtually a non-issue for now (except the fact that the RFC specify a normalized score).

The problem will really arise when we'll start adding results from other Kibana instances (in v2 or v3 of the searchbar). Then we'll have different SO results from distinct ES queries with non-normalized scores, and currently no way no normalize all that...

@ryankeairns
Copy link
Contributor

@pgayvallet regarding results from multiple instances, there was a suggestion for grouping results that might help us in this case. Also, we could consider having users navigate to an 'advanced search' results page that could allow for more robust filtering of results.

@pgayvallet
Copy link
Contributor Author

@myasonik @ryankeairns After some discussion with the team, it seems there is no easy way to have consistent scoring between different providers. score value from results of a given provider will be consistent (higher score -> better match), but we can't have normalized scoring between providers.

Functionally, that means that application results' scores would not be comparable to savedObject's results from the current instance, and the same would apply to SO results from the current instance and results from other spaces and/or instances (for v2+).

From what I understand, this would not really be an issue as you were planning to sort by type display application results on top anyway.

Can you confirm this would be acceptable to you?

I could even add a provider field to the results to help this sorting.

@ryankeairns
Copy link
Contributor

ryankeairns commented Jun 10, 2020

@pgayvallet that is correct, we would want the application results first so sorting/separating those to the top (by type) makes sense. If I'm following your last sentence, we could use the provider field as the "type", that would be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss NeededFor:Core UI REASSIGN from Team:Core UI Deprecated label for old Core UI team Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants