-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't pick tried endpoint & count the latest in ewma balancer #6639
Don't pick tried endpoint & count the latest in ewma balancer #6639
Conversation
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Welcome @spacewander! |
Hi @spacewander. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
/assign @ElvinEfendi |
@@ -18,6 +18,12 @@ function _M.get_first_value(var) | |||
return t[1] | |||
end | |||
|
|||
function _M.get_last_value(var) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you write a unit test for this function?
this PR addresses only one part of that issue, the algorithm would still potentially try a failed endpoint |
@ElvinEfendi |
local lowest_score | ||
for i = 1, k do | ||
local key = get_upstream_name(peers[i]) | ||
if chosen_endpoints[key] then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's assume peers
has 10 elements, out of which 2 are already tried. With the approach you implemented there's still a chance that we pick already tried upstream. But alternatively we can introduce a new list i.e filtered_peers
, which is a copy of peers
without the 2 already tried peers. That way we guarantee that we don't pick an already tried peer. I'm curious why you did not take this approach instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ElvinEfendi
Aha, I didn't notice there is a PICK_SET_SIZE limitation.
Just curiously, why we set a PICK_SET_SIZE limitation and its value is 2? The original implementation (https://github.com/twitter/finagle/blob/1bc837c4feafc0096e43c0e98516a8e1c50c4421/finagle-core/src/main/scala/com/twitter/finagle/loadbalancer/PeakEwma.scala) seems don't have the same limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That thing seems to be specifically about calculating peak ewma. I don't see shuffling or picking nodes in that code.
I think value 2 is based on "Michael Mitzenmacher. 2001. The Power of Two Choices in Randomized Load Balancing. IEEE Trans. Parallel Distrib. Syst. 12, 10 (October 2001), 1094-1104."
@@ -170,20 +190,30 @@ function _M.balance(self) | |||
if #peers > 1 then | |||
local k = (#peers < PICK_SET_SIZE) and #peers or PICK_SET_SIZE | |||
local peer_copy = util.deepcopy(peers) | |||
endpoint, ewma_score = pick_and_score(peer_copy, k) | |||
|
|||
local chosen_endpoints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about tried_endpoints
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
@spacewander please squash the commits and reword the commit message |
cab3c85
to
e118ebc
Compare
@aledbf |
@@ -170,20 +177,50 @@ function _M.balance(self) | |||
if #peers > 1 then | |||
local k = (#peers < PICK_SET_SIZE) and #peers or PICK_SET_SIZE | |||
local peer_copy = util.deepcopy(peers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is unnecessary now, no? Because below you effectively do the same by copying peers
to filtered_peers
?
|
||
local peer = two_endpoints_instance:balance() | ||
assert.equal("10.10.10.3:8080", peer) | ||
end) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the logic in these two tests, where are you marking the endpoints as tried? I'd have expected you modify ngx.ctx.balancer_ewma_tried_endpoints
to control these tests.
--
Also an additional test is needed to ensure ngx.ctx.balancer_ewma_tried_endpoints
is set in balance function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ngx.ctx
is shared between tests. I will change the test to use ngx.ctx
directly so the test will be more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ElvinEfendi
Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, yeah it is better to keep the tests independent of each other.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ElvinEfendi, spacewander The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
Types of changes
Which issue/s this PR fixes
fixes #6632
How Has This Been Tested?
Checklist: