-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[META] Protobuf for Search API #10684
Comments
would like to do +1 on this feature. This will be great win specially for vector search where search payload increases with increase in dimension of the vectors. |
@navneet1v is there data we could look where the bottleneck is for vector search? |
@saratvemulapalli I don't have data available, because while running the benchmarks we remove the _source field. But what I can do is I can run a small benchmarks and provide you that info. Will that work? |
Yeah what we'd like to know is serialization/de-serialization causing performance latency, and is that during query phase or fetch phase etc. This will help us narrow down which area to work on first. |
For vector search it will be fetch phase only, because that is where a vector having like lets say 100 dimension is getting serialized and de-serialized per document. Consider like every float is represented as 4bytes it becomes like 400 bytes just alone for vectors. and the same gets transported over the wire to customers too. Hence once we start to make the change it will help a lot.
+1 on this. From benchmarks I will see what I can provide, but I can surely help setup the benchmark code so that you guys can run of different custom OpenSearch to get more numbers if you want. |
That would be super helpful @navneet1v! |
@saratvemulapalli , @VachaShah Here is one of the benchmarking notebook which you can use, to test k-NN with any OpenSearch cluster with Security Enabled. This is something I was working on. This replicates the behavior on how we do perf testing in K-NN. Good thing about this is its is easy to run.
You can remove all the optimization that we have added:
Please let me know if you need any more details happy to help. |
This is super exciting @VachaShah!! This could include heavily exercised code paths during query executions such as |
Thank you @getsaurabh02! I have added the classes that I am targeting first in the issue description. They include the Query and Fetch phase implementations of SearchPhaseResult and other related request, response and transport action classes. |
The code for end-to-end working POC with The current POC is for QUERY_THEN_FETCH search types with some embedded objects in the Response as bytes. The next step is to convert those embedded objects into proto messages as well. I am going to micro benchmarking the protobuf integrated API to compare with the original search API. Next steps
|
@VachaShah - Will this require documentation? If so, can you please create a doc issue, let me know who will be submitting the doc PR, and add it to the unified tracker project? Thank you! |
@hdhalter This does not require documentation as of now since it is a performance improvement. I am going to divide this meta issue into sub tasks and if any of them have a need for documentation, I will make sure to create a doc issue for those. |
In order to divide this issue into deliverables, converting this issue into a meta issue. Sub tasks will be listed in the issue description. |
Thanks @VachaShah for breaking it down. |
BenchmarksThe benchmarks are taken using opensearch-benchmark for both the original search API and protobuf version of the API for default searches in benchmarks. The workload used is nyc_taxis. 5 nodes clusterSeeing a 19.1% decrease in latency for default search with protobuf integration. Original API
API with Protobuf
10 nodes clusterSeeing a 23.03% decrease in latency for default search with protobuf integration. Original API
API with protobuf
|
Just a quick note, please also benchmark/profile for CPU and JVM overhead reduction with the change during ser/de. |
@VachaShah - 100th percentile is generally skewed, 90th percentile is more reliable, can you please share the numbers for 90th and 99th percentile. Also, are these single runs or average of 'N' runs? |
OSB runs 100 iterations for search, so this is average for those 100 runs. OSB publishes the 100th percentile for the operations, I think the OSB code needs to be modified to get the other percentiles. @gkamat Is there a way to customize this from command line? |
CPU UtilizationI analyzed the CPU used when running benchmarks and with async-profiler. CPU used during search
async-profiler CPU profile
|
Hi, are we on track for this to be released in 2.12 ? |
Given we are trying to build some cleaner abstractions around detecting the protocol and making it pluggable through some refactoring, this will need some more time and I believe the code will be in a good state by 2.13 |
UpdateWith the discussion on using protobuf for API request/response and node-to-node communication in the transport layer, we have first taken up making the transport layer abstract to support multiple protocols for serialization and deserialization. This will decouple the node-to-node communication in the transport layer from the current serialization mechanism (which is now referred to as After these changes, we will take up adding protobuf into the codebase for search API in a way that the API request/response layer is not tightly coupled with the serialization mechanisms (which is the case currently for example how Transport layer abstractions and decoupling
Introduction of protobuf for search API (WIP - might be divided into more PRs)
|
Proposal
With the experiment done for using protobuf in _cat/nodes API (see #6844 (comment)) and a 15-30% improvement depending on the size of the cluster, we can assess the benefits of protobuf for Search API in terms of serialization and de-serialization of the requests and responses in between nodes and at the REST level.
Next Steps
Do a similar experiment for the Search API to understand the performance improvements using protobuf. (This is a work in progress)
Details
In order to experiment and see incremental benefits, some of the classes I am targeting to convert to protobuf:
SearchRequest
,SearchResponse
,SearchPhaseResult
,FetchSearchResult
,ShardFetchRequest
,ShardSearchRequest
,QueryFetchSearchResult
,QuerySearchRequest
,QuerySearchResult
. Also, classes related toTransportSearchAction
are required to support protobuf requests and responses.Code changes
The code changes for the experiment are being added in a branch on my fork: https://github.com/VachaShah/OpenSearch/compare/poc-cat-nodes-protobuf...VachaShah:OpenSearch:poc-search-protobuf?expand=1
Sub tasks/Milestones
Next steps as discussed with @msfroh and @getsaurabh02. Let me know if I missed something here.
QuerySearchResult
,FetchSearchResult
andQueryFetchSearchResult
into a proto message and add support to communicate this response between nodes using protobuf bytes. This includes adding support for protobuf in OutboundHandler and InboundHandler. Deliver this as an experimental feature. There will be a feature flag which if enabled will allow node-to-node communication using protobuf.Note: All of the above changes go behind an experimental feature flag. Once the incremental changes are in for the Search API:
Related
The text was updated successfully, but these errors were encountered: