-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Search/Query may failed during updating delegator cache. #37116
Conversation
@weiliu1031 cpp-unit-test check failed, comment |
@weiliu1031 E2e jenkins job failed, comment |
/run-cpu-e2e |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #37116 +/- ##
==========================================
- Coverage 83.21% 81.02% -2.20%
==========================================
Files 1015 1305 +290
Lines 157418 182849 +25431
==========================================
+ Hits 131001 148147 +17146
- Misses 21218 29515 +8297
+ Partials 5199 5187 -12
|
/hold |
d81c067
to
d5ba8ae
Compare
/unhold |
@weiliu1031 E2e jenkins job failed, comment |
/run-cpu-e2e |
@weiliu1031 E2e jenkins job failed, comment |
3b777da
to
e18e96c
Compare
@weiliu1031 E2e jenkins job failed, comment |
/run-cpu-e2e |
@weiliu1031 E2e jenkins job failed, comment |
/run-cpu-e2e |
@weiliu1031 E2e jenkins job failed, comment |
issue: #37115 pr: #37116 casue init query node client is too heavy, so we remove updateShardClient from leader mutex, which cause much more concurrent cornor cases. This PR delay query node client's init operation until `getClient` is called, then use leader mutex to protect updating shard client progress to avoid concurrent issues. --------- Signed-off-by: Wei Liu <[email protected]>
/run-cpu-e2e |
@weiliu1031 E2e jenkins job failed, comment |
@weiliu1031 go-sdk check failed, comment |
/run-cpu-e2e |
rerun go-sdk |
rerun ut |
1 similar comment
rerun ut |
/run-cpu-e2e |
@weiliu1031 E2e jenkins job failed, comment |
/run-cpu-e2e |
@weiliu1031 E2e jenkins job failed, comment |
/run-cpu-e2e |
@weiliu1031 E2e jenkins job failed, comment |
/run-cpu-e2e |
@weiliu1031 E2e jenkins job failed, comment |
/run-cpu-e2e |
@weiliu1031 E2e jenkins job failed, comment |
/run-cpu-e2e |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: czs007, weiliu1031 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
cause pr milvus-io#37116 introduce retry on get shard leader, which make search won't fail during query node down. Signed-off-by: Wei Liu <[email protected]>
cause pr milvus-io#37116 introduce retry on get shard leader, which make search won't fail during query node down. Signed-off-by: Wei Liu <[email protected]>
…37480) issue: #37289 cause pr #37116 introduce retry on get shard leader, which make search won't fail during query node down. Signed-off-by: Wei Liu <[email protected]>
…ilvus-io#37480) issue: milvus-io#37289 cause pr milvus-io#37116 introduce retry on get shard leader, which make search won't fail during query node down. Signed-off-by: Wei Liu <[email protected]>
issue: #37115
casue init query node client is too heavy, so we remove updateShardClient from leader mutex, which cause much more concurrent cornor cases.
This PR delay query node client's init operation until
getClient
is called, then use leader mutex to protect updating shard client progress to avoid concurrent issues.