[FEATURE] A tool to help decide the optimal batch size for ingestion with neural search processors #655

chishui · 2024-03-28T02:46:16Z

Is your feature request related to a problem?

In this batch ingestion RFC, we proposed a batch ingestion feature which could accelerate the ingestion with neural search processors. It introduces an additional parameter "batch size" that texts from different documents could be combined and sent to ML server in one request. Since user could have different data set, different ML servers with different resources, in order to achieve better performance, they would need to experiment with different value of batch size to get the optimal performance. To offload the burden from user, we'd like to have a automation tool which could find this optimal batch size automatically.

What solution would you like?

The automation tool would run bulk index with different batch size to see which batch size would lead to optimal performance (high throughput & low latency & no errors). The OpenSearch-benchmark tool already provides rich features on benchmark which we could utilize for this automation. We can call benchmark with different parameter, collect and evaluate results then provide the recommendation.

The tool can be made to help select bulk size and client number as well which could be supported in the future phase.

What alternatives have you considered?

No alternatives.

Do you have any additional context?

No

navneet1v · 2024-04-01T06:05:39Z

@chishui this is an interesting feature, and +1 on building such a tool. I would love to see more details around this tool to be added in the issue description(something like an RFC).

chishui · 2024-04-02T02:57:42Z

Closing this feature request and in favor of the one created in OpenSearch repo to get more attention.

opensearch-project/OpenSearch#13009

chishui added enhancement untriaged labels Mar 28, 2024

chishui mentioned this issue Apr 2, 2024

[RFC] Parallel & Batch Ingestion opensearch-project/OpenSearch#12457

Closed

chishui closed this as completed Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] A tool to help decide the optimal batch size for ingestion with neural search processors #655

[FEATURE] A tool to help decide the optimal batch size for ingestion with neural search processors #655

chishui commented Mar 28, 2024

navneet1v commented Apr 1, 2024

chishui commented Apr 2, 2024

[FEATURE] A tool to help decide the optimal batch size for ingestion with neural search processors #655

[FEATURE] A tool to help decide the optimal batch size for ingestion with neural search processors #655

Comments

chishui commented Mar 28, 2024

Is your feature request related to a problem?

What solution would you like?

What alternatives have you considered?

Do you have any additional context?

navneet1v commented Apr 1, 2024

chishui commented Apr 2, 2024