Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate sync vs async segments upload to remote store (in RemoteStoreRefreshListener) #9024

Closed
ashking94 opened this issue Aug 1, 2023 · 6 comments
Labels
enhancement Enhancement or improvement to existing feature or request Storage:Durability Issues and PRs related to the durability framework Storage Issues and PRs relating to data and metadata storage

Comments

@ashking94
Copy link
Member

ashking94 commented Aug 1, 2023

Currently, the afterRefresh() method of RemoteStoreRefreshListener gets invoked after each refresh synchronously. In the invocation, we upload the segment files that are part of the latest segment infos file for the IndexShard. If the upload fails for whatsoever reason, we schedule retries with exponential backoff. Meanwhile, indexing can continue to happen and so would refreshes subject to segments upload backpressure being enabled and kicking in based on the configured backpressure settings. If the upload has failed, the retry would continue to happen asynchronously until it encounters a success with increasing exponential delay interval with a fixed maximum interval. Based on the discussion in this PR, we have multiple combination of performing the afterRefresh and the retry. The modes are listed below -

  1. sync afterRefresh, sync retry
  2. async afterRefresh, async retry
  3. sync afterRefresh, async retry (current approach)

Approach 1 - sync afterRefresh, sync retry
In this, the afterRefresh method is executed in sync along with the retries in case of failures. The indexing will continue to happen until the remote upload finishes, but there will not be any further refreshes being triggered. The backpressure configuration will need to be changed to support this as with this the refresh lag would always be 1 which currently is ignored given this implies that there is an existing upload in progress. This will require considerable change in the backpressure without much evident gains. In this approach, the Generic thread will remain stuck until the upload succeeds. We will also need to define the behaviour of when to stop the retries and what would be the behaviour of the IndexShard when all retries are exhausted. There will also be side effect of indexing buffer continuing to pile up and due to refresh not happening (but before the updated backpressure kicking in), steep increase in heap usage will be seen until the next refresh happens. sync afterRefresh, however, in happy case will lead to segments creation only after the segments upload which otherwise is 1s by default.

Approach 2 - async afterRefresh, async retry
In this, the segments upload job will be executed asynchronously and would not be blocking the overall refresh flow. This will make the overall upload async for both the afterRefresh flow as well as the retry. In this approach, the segments creation can happen at a faster rate than the first approach and the upload job’s responsibility would be to upload the most recent state. It is possible that the amortised upload bytes reduces considerably with this approach. The segments upload backpressure will need to be updated to account for higher in general refresh lag due to more frequent segments creation. Since the translog cleanup is hooked with successful upload, the translog cleanup will not need to be changed. With this, the Generic thread will get freed up earlier and not being blocked on refresh. Overall this approach looks promising. We, however, need to ensure that all ITs/UTs stays intact along with no degradation in latency/throughput/perceived replication lag during performance testing.

Distinction between user initiated refresh and interval refresh
As of today, when the control comes inside any RefreshListener, it is not possible to know if the refresh was triggered externally or internally. If we can differentiate b/w these 2, we can ensure a synchronous experience for the user initiated refreshes so that the response gets sent only after the upload has been done successfully. We will need to evaluate this further.

In conclusion, we will want to pivot to approach 2 with appropriate changes and testing. Looking for thoughts and suggestions for the same.

@ashking94 ashking94 added enhancement Enhancement or improvement to existing feature or request Storage:Durability Issues and PRs related to the durability framework v2.1.1 Issues and PRs related to version 2.1.1 Storage Issues and PRs relating to data and metadata storage labels Aug 1, 2023
@ashking94
Copy link
Member Author

Tagging @gbbafna @sachinpkale @Bukhtawar for comments/suggestions.

@ashking94 ashking94 removed untriaged v2.1.1 Issues and PRs related to version 2.1.1 labels Aug 1, 2023
@ankitkala
Copy link
Member

I agree with the approach 2 of keeping segments upload and retry logic completely in async. Apart from indexing buffer pileup, this will ensure that the primary can still refresh even if the upload from previous refresh is taking longer(e.g. after merges). Thus the searches on primary won't be stale.

Can you also add your thoughts on how far can the primary upload fall behind? So if primary hasn't uploaded data for last X refreshes, we might want to have the backpressure mechanism rejecting new requests.

@shwetathareja
Copy link
Member

Also would segment upload be strictly in order i.e. if segment n uploaded successfully then only upload segment n+1 or it can upload them in parallel.

@ashking94
Copy link
Member Author

I agree with the approach 2 of keeping segments upload and retry logic completely in async. Apart from indexing buffer pileup, this will ensure that the primary can still refresh even if the upload from previous refresh is taking longer(e.g. after merges). Thus the searches on primary won't be stale.

True

Can you also add your thoughts on how far can the primary upload fall behind? So if primary hasn't uploaded data for last X refreshes, we might want to have the backpressure mechanism rejecting new requests.

This we can tune with the backpressure settings. We currently have 3 gating parameters - 1. time lag 2. bytes lag & 3. consecutive failures. These can be used to have the rejection kick in basis the breach of any of the parameters.

@ashking94
Copy link
Member Author

Also would segment upload be strictly in order i.e. if segment n uploaded successfully then only upload segment n+1 or it can upload them in parallel.

Segments created out of multiple refreshes can get uploaded together based on how frequently the upload is happening. The segments creation can continue to happen at refresh interval and lets suppose between 2 uploads x refreshes have occurred locally, then the segments uploaded would happen for all files that are part of the latest segment infos but are not present in remote store.

@harishbhakuni
Copy link
Contributor

@ashking94 for snapshot interop w/ remote store, we rely on the afterRefresh method to upload all the segments and md after a commit. if we make that async, shallow snapshots would not work. We need to make the afterRefresh call sync atleast for the first refresh after a commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Storage:Durability Issues and PRs related to the durability framework Storage Issues and PRs relating to data and metadata storage
Projects
None yet
Development

No branches or pull requests

4 participants