Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk action refresh param #1743

Merged
merged 7 commits into from
Jul 10, 2023
Merged

Conversation

inqueue
Copy link
Member

@inqueue inqueue commented Jun 29, 2023

This commit adds support for the bulk ?refresh query parameter to address feedback for a new workload for serverless. Possible values for refresh are true (?refresh=true) for async, false (?refresh=false), and wait_for (?refresh=wait_for) for sync, consistent with https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html#bulk-refresh.

@inqueue inqueue added the enhancement Improves the status quo label Jun 29, 2023
@inqueue inqueue requested review from pquentin and b-deam June 29, 2023 22:25
@inqueue inqueue self-assigned this Jun 29, 2023
@b-deam
Copy link
Member

b-deam commented Jun 30, 2023

Approach looks fine to me, but curious how you intend to handle more complex configurations where want to change the refresh param dynamically? Will we require users to implement a custom param source?

@inqueue
Copy link
Member Author

inqueue commented Jun 30, 2023

Approach looks fine to me, but curious how you intend to handle more complex configurations where want to change the refresh param dynamically? Will we require users to implement a custom param source?

I am not too concerned with changing the refresh param dynamically. I do not have a solution for doing this in a deterministic way. For now, it is good enough to issue a manual refresh to the target in a parallel operation. Right now, I am more concerned with the use of the refresh parameter to avoid situations where search-idle optimizations kick in for smaller corpora where the use case of this parameter is most applicable.

Does it make more sense to change the possible values to strings only? Like:

  • sync - Equivalent to ?refresh=wait_for
  • async - Equivalent to ?refesh=true
  • default - Equivalent to ?refresh=false

This would leave the door open for a future parameter value like sometimes where ?refresh=true (or ?refresh=wait_for) for a subset of requests as you have suggested elsewhere.

@inqueue inqueue marked this pull request as ready for review July 6, 2023 19:21
@inqueue
Copy link
Member Author

inqueue commented Jul 6, 2023

Does it make more sense to change the possible values to strings only? Like:

  • sync - Equivalent to ?refresh=wait_for
  • async - Equivalent to ?refesh=true
  • default - Equivalent to ?refresh=false

The latest commit uses this model. It can be tested against serverless with the track in elastic/rally-tracks#425.

esrally race --track=serverless_k8s --pipeline=benchmark-only --client-options='{"default": {"use_ssl": true, "verify_certs": false, "basic_auth_user":"${ESUSER}","basic_auth_password":"${ESPASSWORD}"}}' --telemetry=node-stats,shard-stats,data-stream-stats --telemetry-params="node-stats:node-stats-include-indices" --target-hosts=${TARGET} --track-params="bulk_size:100,ingest_percentage:10,bulk_refresh:async" --kill-running-processes --test-mode

esrally/driver/runner.py Outdated Show resolved Hide resolved
esrally/driver/runner.py Outdated Show resolved Hide resolved
Copy link
Member

@b-deam b-deam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, but it otherwise LGTM.

if params["refresh"] in valid_refresh_values:
bulk_params["refresh"] = params["refresh"]
else:
self.logger.info(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my last comment, I see now what you're checking for. I think both INFO and WARNING aren't enough this case, and we should probably just error out because of an invalid value - otherwise we risk returning misleading results and the corresponding log message is buried deep in the file somewhere.

Maybe we can error out similar to how it's done here?:

rally/esrally/driver/runner.py

Lines 2609 to 2612 in b1a822c

if op_type not in self.supported_op_types:
raise exceptions.RallyAssertionError(
f"Unsupported operation-type [{op_type}]. Use one of [{', '.join(self.supported_op_types)}]."
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably also just lean on the client itself to return an error, if you pass something in that's not valid does it return anything meaningful?

Copy link
Member

@b-deam b-deam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one minor comment around unsupported values, but no need for another review cycle. LGTM.

@inqueue inqueue merged commit 01f3c25 into elastic:master Jul 10, 2023
11 checks passed
@pquentin pquentin added this to the 2.9.0 milestone Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants