Evaluate sync vs async segments upload to remote store (in RemoteStoreRefreshListener) #9024

ashking94 · 2023-08-01T13:13:43Z

Currently, the afterRefresh() method of RemoteStoreRefreshListener gets invoked after each refresh synchronously. In the invocation, we upload the segment files that are part of the latest segment infos file for the IndexShard. If the upload fails for whatsoever reason, we schedule retries with exponential backoff. Meanwhile, indexing can continue to happen and so would refreshes subject to segments upload backpressure being enabled and kicking in based on the configured backpressure settings. If the upload has failed, the retry would continue to happen asynchronously until it encounters a success with increasing exponential delay interval with a fixed maximum interval. Based on the discussion in this PR, we have multiple combination of performing the afterRefresh and the retry. The modes are listed below -

sync afterRefresh, sync retry
async afterRefresh, async retry
sync afterRefresh, async retry (current approach)

Approach 1 - sync afterRefresh, sync retry
In this, the afterRefresh method is executed in sync along with the retries in case of failures. The indexing will continue to happen until the remote upload finishes, but there will not be any further refreshes being triggered. The backpressure configuration will need to be changed to support this as with this the refresh lag would always be 1 which currently is ignored given this implies that there is an existing upload in progress. This will require considerable change in the backpressure without much evident gains. In this approach, the Generic thread will remain stuck until the upload succeeds. We will also need to define the behaviour of when to stop the retries and what would be the behaviour of the IndexShard when all retries are exhausted. There will also be side effect of indexing buffer continuing to pile up and due to refresh not happening (but before the updated backpressure kicking in), steep increase in heap usage will be seen until the next refresh happens. sync afterRefresh, however, in happy case will lead to segments creation only after the segments upload which otherwise is 1s by default.

Approach 2 - async afterRefresh, async retry
In this, the segments upload job will be executed asynchronously and would not be blocking the overall refresh flow. This will make the overall upload async for both the afterRefresh flow as well as the retry. In this approach, the segments creation can happen at a faster rate than the first approach and the upload job’s responsibility would be to upload the most recent state. It is possible that the amortised upload bytes reduces considerably with this approach. The segments upload backpressure will need to be updated to account for higher in general refresh lag due to more frequent segments creation. Since the translog cleanup is hooked with successful upload, the translog cleanup will not need to be changed. With this, the Generic thread will get freed up earlier and not being blocked on refresh. Overall this approach looks promising. We, however, need to ensure that all ITs/UTs stays intact along with no degradation in latency/throughput/perceived replication lag during performance testing.

Distinction between user initiated refresh and interval refresh
As of today, when the control comes inside any RefreshListener, it is not possible to know if the refresh was triggered externally or internally. If we can differentiate b/w these 2, we can ensure a synchronous experience for the user initiated refreshes so that the response gets sent only after the upload has been done successfully. We will need to evaluate this further.

In conclusion, we will want to pivot to approach 2 with appropriate changes and testing. Looking for thoughts and suggestions for the same.

The text was updated successfully, but these errors were encountered:

ashking94 · 2023-08-01T13:14:55Z

Tagging @gbbafna @sachinpkale @Bukhtawar for comments/suggestions.

ankitkala · 2023-08-02T08:24:11Z

I agree with the approach 2 of keeping segments upload and retry logic completely in async. Apart from indexing buffer pileup, this will ensure that the primary can still refresh even if the upload from previous refresh is taking longer(e.g. after merges). Thus the searches on primary won't be stale.

Can you also add your thoughts on how far can the primary upload fall behind? So if primary hasn't uploaded data for last X refreshes, we might want to have the backpressure mechanism rejecting new requests.

shwetathareja · 2023-08-02T09:54:23Z

Also would segment upload be strictly in order i.e. if segment n uploaded successfully then only upload segment n+1 or it can upload them in parallel.

ashking94 · 2023-08-02T11:01:47Z

I agree with the approach 2 of keeping segments upload and retry logic completely in async. Apart from indexing buffer pileup, this will ensure that the primary can still refresh even if the upload from previous refresh is taking longer(e.g. after merges). Thus the searches on primary won't be stale.

True

Can you also add your thoughts on how far can the primary upload fall behind? So if primary hasn't uploaded data for last X refreshes, we might want to have the backpressure mechanism rejecting new requests.

This we can tune with the backpressure settings. We currently have 3 gating parameters - 1. time lag 2. bytes lag & 3. consecutive failures. These can be used to have the rejection kick in basis the breach of any of the parameters.

ashking94 · 2023-08-02T11:04:57Z

Also would segment upload be strictly in order i.e. if segment n uploaded successfully then only upload segment n+1 or it can upload them in parallel.

Segments created out of multiple refreshes can get uploaded together based on how frequently the upload is happening. The segments creation can continue to happen at refresh interval and lets suppose between 2 uploads x refreshes have occurred locally, then the segments uploaded would happen for all files that are part of the latest segment infos but are not present in remote store.

harishbhakuni · 2023-08-16T13:13:47Z

@ashking94 for snapshot interop w/ remote store, we rely on the afterRefresh method to upload all the segments and md after a commit. if we make that async, shallow snapshots would not work. We need to make the afterRefresh call sync atleast for the first refresh after a commit.

ashking94 added enhancement Enhancement or improvement to existing feature or request Storage:Durability Issues and PRs related to the durability framework v2.1.1 Issues and PRs related to version 2.1.1 Storage Issues and PRs relating to data and metadata storage labels Aug 1, 2023

github-actions bot added the untriaged label Aug 1, 2023

ashking94 removed untriaged v2.1.1 Issues and PRs related to version 2.1.1 labels Aug 1, 2023

ashking94 closed this as completed Nov 29, 2023

Bukhtawar mentioned this issue Jun 27, 2024

Add Ashish Singh as maintainer #14567

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate sync vs async segments upload to remote store (in RemoteStoreRefreshListener) #9024

Evaluate sync vs async segments upload to remote store (in RemoteStoreRefreshListener) #9024

ashking94 commented Aug 1, 2023 •

edited

Loading

ashking94 commented Aug 1, 2023

ankitkala commented Aug 2, 2023

shwetathareja commented Aug 2, 2023

ashking94 commented Aug 2, 2023

ashking94 commented Aug 2, 2023

harishbhakuni commented Aug 16, 2023

Evaluate sync vs async segments upload to remote store (in RemoteStoreRefreshListener) #9024

Evaluate sync vs async segments upload to remote store (in RemoteStoreRefreshListener) #9024

Comments

ashking94 commented Aug 1, 2023 • edited Loading

ashking94 commented Aug 1, 2023

ankitkala commented Aug 2, 2023

shwetathareja commented Aug 2, 2023

ashking94 commented Aug 2, 2023

ashking94 commented Aug 2, 2023

harishbhakuni commented Aug 16, 2023

ashking94 commented Aug 1, 2023 •

edited

Loading