-
Notifications
You must be signed in to change notification settings - Fork 39.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to avoid etcd.Get as part of Delete operation #89828
Try to avoid etcd.Get as part of Delete operation #89828
Conversation
b8ed2cb
to
3ef6954
Compare
107abf0
to
f265b1b
Compare
/retest |
ef03de5
to
dc551cc
Compare
97585a3
to
c50ea57
Compare
@wojtek-t: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Neither of the errors couldn't be caused by problems with delete. |
@liggitt - I have analyzed deeply everything that happened with GuaranteedUpdate, added some tests to upfronet ensure that issues that it caused are tested explicitly (for those that it made sense) and I think it's ready for a pass of review; PTAL when you will get out of your 1.20 work |
@@ -227,8 +269,13 @@ func (s *store) conditionalDelete(ctx context.Context, key string, out runtime.O | |||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a unit test to make sure the right thing happens if the suggestion is stale and the key no longer exists (was already deleted). I think we would get a NotFound here and return, but verify the right thing happens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added "TestDeleteWithSuggestionOfDeletedObject" test.
That said - it doesn't exercise this path. What happens is that the transation fails, we get into
if !txnResp.Succeeded {
branch and this is handled in getState (getResp contains empty KV field and that is handled as returning "not-found" error).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
txnResp.Responses[0]
exists for a NotFound response?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - it exists and being GetResponse contains empty set of KV pairs.
c50ea57
to
0f88803
Compare
0f88803
to
7bab6a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liggitt - comments addressed PTAL
@@ -227,8 +269,13 @@ func (s *store) conditionalDelete(ctx context.Context, key string, out runtime.O | |||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added "TestDeleteWithSuggestionOfDeletedObject" test.
That said - it doesn't exercise this path. What happens is that the transation fails, we get into
if !txnResp.Succeeded {
branch and this is handled in getState (getResp contains empty KV field and that is handled as returning "not-found" error).
/hold cancel |
ok, I'm satisfied this is functionally correct. I still didn't see the performance numbers demonstrating the benefit was worth the additional complexity:
Did I miss where those were provided? |
Sorry - I forgot to add that in the meantime. I added the following to the PR description: @liggitt - PTAL |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liggitt, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This was done for GuaranteedUpdate before in #35415 but later there were multiple (direct or indirect) bug fixes to that logic:
#40664 : Allow values to be wrapped prior to serialization in etcd
#47703 : Do not persist SelfLink into etcd storage
#48394 : GuaranteedUpdate must write if stored data is not canonical
#43152 : etcd3 store: retry with live object on conflict if there was a suggestion
#54780 : partial fix crd patch failing
#58375 : Recheck if transformed data is stale when doing live lookup during update
#77619 : In GuaranteedUpdate, retry on any error if we are working with cached data
#78713 : Set expected in-memory version when decoding unstructured objects from etcd
#82303 : In GuaranteedUpdate, retry on a precondition check failure if we are working with cached data
The PR upfront tries to address it by extensive testing based on all the issues listed above
Depending on the run and scale, we've seen between 10% and 60% reduction of latency on 99th percentiled for Delete API calls.