-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance: Enable node assign policy on resource group #36968
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: weiliu1031 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@weiliu1031 Please associate the related issue to the body of your Pull Request. (eg. “issue: #”) |
@weiliu1031 E2e jenkins job failed, comment |
78cf4dc
to
cc5ad70
Compare
@weiliu1031 E2e jenkins job failed, comment |
/run-cpu-e2e |
cc5ad70
to
dd7df02
Compare
@weiliu1031 go-sdk check failed, comment |
rerun go-sdk |
dd7df02
to
f9e1b06
Compare
@weiliu1031 go-sdk check failed, comment |
@weiliu1031 E2e jenkins job failed, comment |
f9e1b06
to
bbf9b69
Compare
@weiliu1031 cpp-unit-test check failed, comment |
@weiliu1031 E2e jenkins job failed, comment |
/run-cpu-e2e |
/run-cpu-e2e |
rerun go-sdk |
@weiliu1031 go-sdk check failed, comment |
@weiliu1031 E2e jenkins job failed, comment |
155d22f
to
8c6e534
Compare
@weiliu1031 E2e jenkins job failed, comment |
@weiliu1031 go-sdk check failed, comment |
8c6e534
to
a5321f3
Compare
@weiliu1031 cpp-unit-test check failed, comment |
@weiliu1031 E2e jenkins job failed, comment |
rerun cpp-unit-test |
/run-cpu-e2e |
@weiliu1031 cpp-unit-test check failed, comment |
@weiliu1031 E2e jenkins job failed, comment |
a5321f3
to
1bb198f
Compare
@weiliu1031 go-sdk check failed, comment |
@weiliu1031 cpp-unit-test check failed, comment |
@weiliu1031 E2e jenkins job failed, comment |
/run-cpu-e2e |
rerun go-sdk |
rerun cpp-unit-test |
ret := make(map[string]string) | ||
switch role { | ||
case "querynode": | ||
supportedLabelPrefix := paramtable.Get().QueryNodeCfg.MilvusServerLabelPrefix.GetValue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be better that hard code MILVUS_SERVER_LABEL_QUERYNODE_ directly?
We can conveniently add label like MILVUS_SERVER_LABEL_STREAMINGNODE_ for streamingnode, MILVUS_SERVER_LABEL_PROXY for proxy...
It is not easy to achieve compatibility and maintenance using environment variables that can be configured by another configuration.
|
||
if len(dirtyNodes) > 0 { | ||
return len(dirtyNodes) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be len(dirtyNodes) + min(rg.NodeNum() - int(rg.cfg.Limits.NodeNum), 0)?
name: rg.name, | ||
nodes: rg.nodes.Clone(), | ||
cfg: rg.GetConfigCloned(), | ||
nodeMgr: rg.nodeMgr, | ||
} | ||
} | ||
|
||
// MeetRequirement return whether resource group meet requirement. | ||
// Return error with reason if not meet requirement. | ||
func (rg *ResourceGroup) MeetRequirement() error { | ||
// if len(node) is less than requests, new node need to be assigned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use RedundantNumOfNodes() > 0 || MissingNumOfNodes() > 0 to keep the consistency between api.
@@ -576,7 +577,7 @@ func (rm *ResourceManager) selectMissingRecoverSourceRG(rg *ResourceGroup) *Reso | |||
// First, Transfer node from most redundant resource group first. `len(nodes) > limits` | |||
if redundantRG := rm.findMaxRGWithGivenFilter( | |||
func(sourceRG *ResourceGroup) bool { | |||
return rg.GetName() != sourceRG.GetName() && sourceRG.RedundantNumOfNodes() > 0 | |||
return rg.GetName() != sourceRG.GetName() && sourceRG.RedundantNumOfNodes() > 0 && rg.AcceptNode(sourceRG.PreferRemovedNode()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here could be a failure logic.
In current implementation, If the rg1's PerferRemoveNode()
return a node that rg2 don't accept, the rg2 can never get node from the rg1.
So the PreferRemoveNode should be a logic that passing rg1
as a argument to determine the best and stable node that move from rg2
into rg1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer that the selectMissingRecoverSourceRG
return the sourceRG and the target node id together to avoid unstable node selection
@@ -633,7 +634,7 @@ func (rm *ResourceManager) selectRedundantRecoverTargetRG(rg *ResourceGroup) *Re | |||
// First, Transfer node to most missing resource group first. | |||
if missingRG := rm.findMaxRGWithGivenFilter( | |||
func(targetRG *ResourceGroup) bool { | |||
return rg.GetName() != targetRG.GetName() && targetRG.MissingNumOfNodes() > 0 | |||
return rg.GetName() != targetRG.GetName() && targetRG.MissingNumOfNodes() > 0 && targetRG.AcceptNode(rg.PreferRemovedNode()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same with selectMissingRecoverSourceRG
.
1bb198f
to
7cf414c
Compare
Signed-off-by: Wei Liu <[email protected]>
7cf414c
to
cf0e01b
Compare
@weiliu1031 E2e jenkins job failed, comment |
/run-cpu-e2e |
issue: #36977
with node_label_filter on resource group, user can add label on querynode with env
MILVUS_COMPONENT_LABEL
, then resource group will prefer to accept node which match it's node_label_filter.then querynode's can't be group by labels, and put querynodes with same label to same resource groups.