Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Search/Query may failed during updating delegator cache. #37116

Merged
merged 5 commits into from
Nov 5, 2024

Conversation

weiliu1031
Copy link
Contributor

@weiliu1031 weiliu1031 commented Oct 24, 2024

issue: #37115

casue init query node client is too heavy, so we remove updateShardClient from leader mutex, which cause much more concurrent cornor cases.

This PR delay query node client's init operation until getClient is called, then use leader mutex to protect updating shard client progress to avoid concurrent issues.

@sre-ci-robot sre-ci-robot added the size/S Denotes a PR that changes 10-29 lines. label Oct 24, 2024
@mergify mergify bot added dco-passed DCO check passed. kind/bug Issues or changes related a bug labels Oct 24, 2024
Copy link
Contributor

mergify bot commented Oct 24, 2024

@weiliu1031 cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Oct 24, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@weiliu1031
Copy link
Contributor Author

/run-cpu-e2e

Copy link

codecov bot commented Oct 24, 2024

Codecov Report

Attention: Patch coverage is 77.77778% with 10 lines in your changes missing coverage. Please review.

Project coverage is 81.02%. Comparing base (be71b98) to head (0add1cf).
Report is 15 commits behind head on master.

Files with missing lines Patch % Lines
internal/proxy/shard_client.go 67.74% 8 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #37116      +/-   ##
==========================================
- Coverage   83.21%   81.02%   -2.20%     
==========================================
  Files        1015     1305     +290     
  Lines      157418   182849   +25431     
==========================================
+ Hits       131001   148147   +17146     
- Misses      21218    29515    +8297     
+ Partials     5199     5187      -12     
Components Coverage Δ
Client ∅ <ø> (∅)
Core 67.17% <ø> (∅)
Go 83.28% <77.77%> (+0.03%) ⬆️
Files with missing lines Coverage Δ
internal/proxy/lb_policy.go 97.90% <100.00%> (+0.17%) ⬆️
internal/proxy/meta_cache.go 91.39% <100.00%> (ø)
internal/proxy/shard_client.go 82.60% <67.74%> (+4.12%) ⬆️

... and 323 files with indirect coverage changes

@weiliu1031
Copy link
Contributor Author

/hold

@sre-ci-robot sre-ci-robot added size/M Denotes a PR that changes 30-99 lines. and removed size/S Denotes a PR that changes 10-29 lines. labels Oct 28, 2024
@weiliu1031
Copy link
Contributor Author

/unhold

Copy link
Contributor

mergify bot commented Oct 28, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@weiliu1031
Copy link
Contributor Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Oct 28, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Oct 28, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@weiliu1031
Copy link
Contributor Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Oct 28, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@weiliu1031
Copy link
Contributor Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Oct 28, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

sre-ci-robot pushed a commit that referenced this pull request Oct 28, 2024
issue: #37115
pr: #37116
casue init query node client is too heavy, so we remove
updateShardClient from leader mutex, which cause much more concurrent
cornor cases.

This PR delay query node client's init operation until `getClient` is
called, then use leader mutex to protect updating shard client progress
to avoid concurrent issues.

---------

Signed-off-by: Wei Liu <[email protected]>
@weiliu1031
Copy link
Contributor Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Oct 29, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 1, 2024

@weiliu1031 go-sdk check failed, comment rerun go-sdk can trigger the job again.

@weiliu1031
Copy link
Contributor Author

/run-cpu-e2e

@weiliu1031
Copy link
Contributor Author

rerun go-sdk

@weiliu1031
Copy link
Contributor Author

rerun ut

1 similar comment
@weiliu1031
Copy link
Contributor Author

rerun ut

@weiliu1031
Copy link
Contributor Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Nov 3, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@weiliu1031
Copy link
Contributor Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Nov 4, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@weiliu1031
Copy link
Contributor Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Nov 4, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@weiliu1031
Copy link
Contributor Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Nov 4, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@weiliu1031
Copy link
Contributor Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Nov 4, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@weiliu1031
Copy link
Contributor Author

/run-cpu-e2e

@mergify mergify bot added the ci-passed label Nov 4, 2024
@czs007
Copy link
Collaborator

czs007 commented Nov 5, 2024

/approve
/lgtm

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: czs007, weiliu1031

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot merged commit b83b376 into milvus-io:master Nov 5, 2024
19 of 20 checks passed
weiliu1031 added a commit to weiliu1031/milvus that referenced this pull request Nov 6, 2024
cause pr milvus-io#37116 introduce retry on get shard leader, which make search
won't fail during query node down.

Signed-off-by: Wei Liu <[email protected]>
weiliu1031 added a commit to weiliu1031/milvus that referenced this pull request Nov 6, 2024
cause pr milvus-io#37116 introduce retry on get shard leader, which make search
won't fail during query node down.

Signed-off-by: Wei Liu <[email protected]>
xiaofan-luan pushed a commit that referenced this pull request Nov 7, 2024
…37480)

issue: #37289
cause pr #37116 introduce retry on get shard leader, which make search
won't fail during query node down.

Signed-off-by: Wei Liu <[email protected]>
weiliu1031 added a commit to weiliu1031/milvus that referenced this pull request Nov 7, 2024
…ilvus-io#37480)

issue: milvus-io#37289
cause pr milvus-io#37116 introduce retry on get shard leader, which make search
won't fail during query node down.

Signed-off-by: Wei Liu <[email protected]>
sre-ci-robot pushed a commit that referenced this pull request Nov 7, 2024
…37480) (#37499)

issue: #37289
pr: #37480

cause pr #37116 introduce retry on get shard leader, which make search
won't fail during query node down.

Signed-off-by: Wei Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved ci-passed dco-passed DCO check passed. kind/bug Issues or changes related a bug lgtm size/M Denotes a PR that changes 30-99 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants