Start using `.options()` for transport parameters #1673

b-deam · 2023-02-21T02:51:20Z

As part of the ES client upgrade to 8.x the client now requires what were previously request level parameters to be set at the client's respective transport level.

For example, perhaps the most pervasive change here is the previously global, per-request setting request-timeout:

# 7.x DEPRECATED USAGE (Don't do this!):
client.search(index="blogs", request_timeout=180)

# 8.0+ SUPPORTED USAGE:
client.options(request_timeout=180).search(index="blogs")

We need to refactor all of our runners (and custom runners in rally-tracks) to set these parameters at the transport level.

Note:

When we move to setting options within runners we need to consider that the underlying elasticsearch-py library reinstantiates the client:
https://github.com/elastic/elasticsearch-py/blob/2b297b2922ab11a1439296737812d9799d8a0025/elasticsearch/_async/client/__init__.py#L500

I think the solution here is probably to override the options method on the respective RallyAsyncElasticsearch and RallySyncElasticsearch clients and do something like this:

- client = type(self)(_transport=self.transport)
+ client = type(self)(_transport=self.transport, distribution_version=str(self.distribution_version))

The text was updated successfully, but these errors were encountered:

…ort level options via 'options()' (#1705) With this commit we fix a bug that prevented any runner using the low level `perform_request()` ES client method, and `_transport_request_params()` helper method from passing in the `request-timeout` parameter. Relates #1673 Relates elastic/rally-tracks#393 Note that there's still follow up work to be done to refactor the remaining runners, specifically around overriding the client's `options()` method to handle the `distribution_version` param to support REST compatibility headers as discussed in #1673, but that will only affect us when 9.x is released.

b-deam · 2023-04-28T02:57:37Z

Mechanically this was going to be a reasonable amount of work on its own, but it turns out that this new approach incurs a relatively high client side overhead.

After merging #1705 I noticed that some nightly benchmarks were displaying higher than normal latency (latency = processing_time + service_time). Some reproductions and profiling later and it's clear that the .options() approach is slower.

The absolute numbers aren't large, we're talking only a few hundred microseconds to a millisecond on each request, but some of our benchmarks are sensitive enough to notice this.

So using some static responses and our in-built profiling (--enable-driver-profiling) it became clear that #1705 .options() causes us to reinstantiate a new client on every request:

name                                                                                                                                          ncall     tsub      ttot      tavg      
/home/esbench/.pyenv/versions/3.11.2/lib/python3.11/asyncio/base_events.py:1845 _UnixSelectorEventLoop._run_once                              3976      0.072129  5.481492  0.001379
/home/esbench/.pyenv/versions/3.11.2/lib/python3.11/asyncio/events.py:78 TimerHandle._run                                                     3977      0.011262  5.254178  0.001321
/home/esbench/rally/esrally/driver/driver.py:1843 AsyncExecutor.__call__                                                                      1         0.078444  5.195176  5.195176
/home/esbench/rally/esrally/driver/driver.py:1796 AsyncProfiler.__call__                                                                      1         0.001048  5.187659  5.187659
/home/esbench/rally/esrally/driver/driver.py:1957 execute_single                                                                              2000      0.026569  4.796528  0.002398
/home/esbench/rally/esrally/driver/runner.py:290 NoCompletion.__call__                                                                        2000      0.006317  4.727379  0.002364
/home/esbench/rally/esrally/driver/runner.py:404 AssertingRunner.__call__                                                                     2000      0.007228  4.721062  0.002361
/home/esbench/rally/esrally/driver/runner.py:338 MultiClientRunner.__call__                                                                   2000      0.013656  4.713834  0.002357
/home/esbench/rally/esrally/driver/runner.py:862 Query.__call__                                                                               2000      0.051384  4.698539  0.002349
/home/esbench/rally/esrally/driver/runner.py:1027 _request_body_query                                                                         2000      0.009655  3.665306  0.001833
/home/esbench/rally/esrally/driver/runner.py:1165 Query._raw_search                                                                           2000      0.015916  3.654368  0.001827
/home/esbench/rally/esrally/client/asynchronous.py:277 RallyAsyncElasticsearch.perform_request                                                2000      0.055657  3.633689  0.001817
/home/esbench/.local/lib/python3.11/site-packages/elastic_transport/_async_transport.py:176 RallyAsyncTransport.perform_request               2000      0.097287  3.163554  0.001582
/home/esbench/.local/lib/python3.11/site-packages/elastic_transport/_node/_http_aiohttp.py:126 RallyAiohttpHttpNode.perform_request           2000      0.075957  2.669151  0.001335
/home/esbench/.local/lib/python3.11/site-packages/aiohttp/client.py:1140 _RequestContextManager.__aenter__                                    2000      0.009856  1.970503  0.000985
/home/esbench/.local/lib/python3.11/site-packages/aiohttp/client.py:364 ClientSession._request                                                2000      0.147258  1.960647  0.000980
+ /home/esbench/.local/lib/python3.11/site-packages/elasticsearch/_async/client/__init__.py:486 RallyAsyncElasticsearch.options                 2000      0.026028  0.937524  0.000469
+ /home/esbench/rally/esrally/client/asynchronous.py:260 RallyAsyncElasticsearch.__init__                                                       2000      0.020966  0.863369  0.000432
+ /home/esbench/.local/lib/python3.11/site-packages/elasticsearch/_async/client/__init__.py:128 RallyAsyncElasticsearch.__init__                2000      0.127874  0.826059  0.000413
+ /home/esbench/.local/lib/python3.11/site-packages/elasticsearch/_async/client/_base.py:375 AsyncSearchClient.__init__                         68000     0.246066  0.694591  0.000010
/home/esbench/.local/lib/python3.11/site-packages/aiohttp/client_reqrep.py:247 StaticRequest.__init__                                         2000      0.055712  0.562107  0.000281
+ /home/esbench/.local/lib/python3.11/site-packages/elasticsearch/_async/client/_base.py:238 RallyAsyncElasticsearch.__init__                   70000     0.295971  0.414709  0.000006

Every time we instantiate a new client we also instantiate all of the namespaced clients too. In the above output we call 2000 search requests, which means we're calling <client>.__init__ 34 times per runner call:
https://github.com/elastic/elasticsearch-py/blob/9ecf7331794952e031f76f1c4b57972f31c81517/elasticsearch/_async/client/__init__.py#L428-L461

...
        # namespaced clients for compatibility with API names
        self.async_search = AsyncSearchClient(self)
        self.autoscaling = AutoscalingClient(self)
        self.cat = CatClient(self)
        self.cluster = ClusterClient(self)
        self.fleet = FleetClient(self)
        self.features = FeaturesClient(self)
        self.indices = IndicesClient(self)
        self.ingest = IngestClient(self)
        self.nodes = NodesClient(self)
        self.snapshot = SnapshotClient(self)
        self.tasks = TasksClient(self)
...

I don't think we'll want to allow changing these parameters per-each <runner>.__call__(), so perhaps we need to find a better way of setting the respective .options() once per runner, on initialisation, and not calling .options() again within <runner>.__call__().

danielmitterdorfer · 2023-04-28T09:34:57Z

@b-deam there is currently also another issue with the existing usage of es.options() that I stumbled upon. The current calls to es.options() (see https://github.com/elastic/rally/pull/1705/files) are ineffective because options returns a new client that needs to be used but we ignore the return value and continue to use the original client. All in all I think this API is not well-suited and this should be fixed directly in the client instead of working around a client API design issue in Rally.

To fix the issue that I mentioned you need to do that:

- es.options(**transport_params)
+ es = es.options(**transport_params)

(for all call sites)

b-deam added the tech debt label Feb 21, 2023

b-deam added this to the 2.7.1 milestone Feb 21, 2023

b-deam mentioned this issue Feb 21, 2023

Upgrade ES Client to 8.x #1669

Merged

19 tasks

pquentin modified the milestones: 2.7.1, 2.8.0 Mar 2, 2023

dliappis mentioned this issue Apr 15, 2023

raw-request task in nyc_taxis#esql fails with request [/_esql] contains unrecognized parameter: [request_timeout]" elastic/rally-tracks#393

Closed

b-deam mentioned this issue Apr 18, 2023

Refactor some runners to set transport level options #1705

Merged

b-deam mentioned this issue May 1, 2023

Actually use client returned from '.options()' #1710

Merged

b-deam self-assigned this May 2, 2023

pquentin modified the milestones: 2.8.0, 2.8.1 May 24, 2023

b-deam mentioned this issue Jun 21, 2023

Refactor version number checks #1738

Merged

gbanasiak modified the milestones: 2.8.1, 2.9.0 Jul 25, 2023

pquentin modified the milestones: 2.9.0, 2.x Aug 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start using `.options()` for transport parameters #1673

Start using `.options()` for transport parameters #1673

b-deam commented Feb 21, 2023 •

edited

Loading

b-deam commented Apr 28, 2023 •

edited

Loading

danielmitterdorfer commented Apr 28, 2023

Start using .options() for transport parameters #1673

Start using .options() for transport parameters #1673

Comments

b-deam commented Feb 21, 2023 • edited Loading

b-deam commented Apr 28, 2023 • edited Loading

danielmitterdorfer commented Apr 28, 2023

Start using `.options()` for transport parameters #1673

Start using `.options()` for transport parameters #1673

b-deam commented Feb 21, 2023 •

edited

Loading

b-deam commented Apr 28, 2023 •

edited

Loading