fix: Search/Query may failed during updating delegator cache. #37116

weiliu1031 · 2024-10-24T09:50:57Z

casue init query node client is too heavy, so we remove updateShardClient from leader mutex, which cause much more concurrent cornor cases.

This PR delay query node client's init operation until getClient is called, then use leader mutex to protect updating shard client progress to avoid concurrent issues.

mergify · 2024-10-24T10:17:46Z

@weiliu1031 cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

mergify · 2024-10-24T10:33:22Z

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

weiliu1031 · 2024-10-24T11:04:39Z

/run-cpu-e2e

codecov · 2024-10-24T12:25:30Z

Codecov Report

Attention: Patch coverage is 77.77778% with 10 lines in your changes missing coverage. Please review.

Project coverage is 81.02%. Comparing base (be71b98) to head (0add1cf).
Report is 15 commits behind head on master.

Files with missing lines	Patch %	Lines
internal/proxy/shard_client.go	67.74%	8 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #37116      +/-   ##
==========================================
- Coverage   83.21%   81.02%   -2.20%     
==========================================
  Files        1015     1305     +290     
  Lines      157418   182849   +25431     
==========================================
+ Hits       131001   148147   +17146     
- Misses      21218    29515    +8297     
+ Partials     5199     5187      -12

Components	Coverage Δ
Client	`∅ <ø> (∅)`
Core	`67.17% <ø> (∅)`
Go	`83.28% <77.77%> (+0.03%)`	⬆️

Files with missing lines	Coverage Δ
internal/proxy/lb_policy.go	`97.90% <100.00%> (+0.17%)`	⬆️
internal/proxy/meta_cache.go	`91.39% <100.00%> (ø)`
internal/proxy/shard_client.go	`82.60% <67.74%> (+4.12%)`	⬆️

... and 323 files with indirect coverage changes

internal/proxy/meta_cache.go

weiliu1031 · 2024-10-28T02:28:57Z

/hold

weiliu1031 · 2024-10-28T03:46:48Z

/unhold

mergify · 2024-10-28T06:47:15Z

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

weiliu1031 · 2024-10-28T06:55:45Z

/run-cpu-e2e

mergify · 2024-10-28T07:09:30Z

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

mergify · 2024-10-28T08:30:42Z

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

weiliu1031 · 2024-10-28T09:35:43Z

/run-cpu-e2e

mergify · 2024-10-28T10:25:06Z

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

weiliu1031 · 2024-10-28T11:21:13Z

/run-cpu-e2e

mergify · 2024-10-28T11:23:32Z

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

issue: #37115 pr: #37116 casue init query node client is too heavy, so we remove updateShardClient from leader mutex, which cause much more concurrent cornor cases. This PR delay query node client's init operation until `getClient` is called, then use leader mutex to protect updating shard client progress to avoid concurrent issues. --------- Signed-off-by: Wei Liu <[email protected]>

weiliu1031 · 2024-10-29T04:22:17Z

/run-cpu-e2e

mergify · 2024-10-29T04:34:36Z

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

mergify · 2024-11-01T07:34:29Z

@weiliu1031 go-sdk check failed, comment rerun go-sdk can trigger the job again.

weiliu1031 · 2024-11-01T08:26:48Z

/run-cpu-e2e

weiliu1031 · 2024-11-01T08:26:55Z

rerun go-sdk

weiliu1031 · 2024-11-01T16:09:24Z

rerun ut

weiliu1031 · 2024-11-03T15:30:57Z

rerun ut

weiliu1031 · 2024-11-03T15:31:07Z

/run-cpu-e2e

mergify · 2024-11-03T16:14:09Z

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

weiliu1031 · 2024-11-04T00:50:05Z

/run-cpu-e2e

mergify · 2024-11-04T01:34:15Z

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

weiliu1031 · 2024-11-04T02:20:36Z

/run-cpu-e2e

mergify · 2024-11-04T03:32:53Z

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

weiliu1031 · 2024-11-04T03:46:12Z

/run-cpu-e2e

mergify · 2024-11-04T04:03:20Z

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

weiliu1031 · 2024-11-04T06:01:26Z

/run-cpu-e2e

mergify · 2024-11-04T06:43:40Z

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

weiliu1031 · 2024-11-04T09:46:08Z

/run-cpu-e2e

czs007 · 2024-11-05T02:51:25Z

/approve
/lgtm

sre-ci-robot · 2024-11-05T02:51:34Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: czs007, weiliu1031

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~internal/proxy/OWNERS~~ [czs007]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

cause pr milvus-io#37116 introduce retry on get shard leader, which make search won't fail during query node down. Signed-off-by: Wei Liu <[email protected]>

…37480) issue: #37289 cause pr #37116 introduce retry on get shard leader, which make search won't fail during query node down. Signed-off-by: Wei Liu <[email protected]>

…ilvus-io#37480) issue: milvus-io#37289 cause pr milvus-io#37116 introduce retry on get shard leader, which make search won't fail during query node down. Signed-off-by: Wei Liu <[email protected]>

…37480) (#37499) issue: #37289 pr: #37480 cause pr #37116 introduce retry on get shard leader, which make search won't fail during query node down. Signed-off-by: Wei Liu <[email protected]>

sre-ci-robot added the size/S Denotes a PR that changes 10-29 lines. label Oct 24, 2024

sre-ci-robot requested review from godchen0212 and xiaocai2333 October 24, 2024 09:51

mergify bot added dco-passed DCO check passed. kind/bug Issues or changes related a bug labels Oct 24, 2024

xiaofan-luan reviewed Oct 25, 2024

View reviewed changes

internal/proxy/meta_cache.go Show resolved Hide resolved

sre-ci-robot added the do-not-merge/hold label Oct 28, 2024

weiliu1031 force-pushed the fix_lb_get_client branch from d81c067 to d5ba8ae Compare October 28, 2024 03:45

sre-ci-robot added size/M Denotes a PR that changes 30-99 lines. and removed size/S Denotes a PR that changes 10-29 lines. labels Oct 28, 2024

sre-ci-robot removed the do-not-merge/hold label Oct 28, 2024

weiliu1031 mentioned this pull request Oct 28, 2024

fix: Search/Query may failed during updating delegator cache #37174

Merged

weiliu1031 force-pushed the fix_lb_get_client branch from 3b777da to e18e96c Compare October 28, 2024 08:11

mergify bot added the ci-passed label Nov 4, 2024

sre-ci-robot assigned czs007 Nov 5, 2024

sre-ci-robot added the lgtm label Nov 5, 2024

sre-ci-robot added the approved label Nov 5, 2024

sre-ci-robot merged commit b83b376 into milvus-io:master Nov 5, 2024
19 of 20 checks passed

weiliu1031 mentioned this pull request Nov 6, 2024

fix: [skip e2e]unstable integration test TestNodeDownOnSingleReplica #37480

Merged

weiliu1031 mentioned this pull request Nov 7, 2024

fix: [skip e2e]unstable integration test TestNodeDownOnSingleReplica(#37480) #37499

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Search/Query may failed during updating delegator cache. #37116

fix: Search/Query may failed during updating delegator cache. #37116

weiliu1031 commented Oct 24, 2024 •

edited

Loading

mergify bot commented Oct 24, 2024

mergify bot commented Oct 24, 2024

weiliu1031 commented Oct 24, 2024

codecov bot commented Oct 24, 2024 •

edited

Loading

weiliu1031 commented Oct 28, 2024

weiliu1031 commented Oct 28, 2024

mergify bot commented Oct 28, 2024

weiliu1031 commented Oct 28, 2024

mergify bot commented Oct 28, 2024

mergify bot commented Oct 28, 2024

weiliu1031 commented Oct 28, 2024

mergify bot commented Oct 28, 2024

weiliu1031 commented Oct 28, 2024

mergify bot commented Oct 28, 2024

weiliu1031 commented Oct 29, 2024

mergify bot commented Oct 29, 2024

mergify bot commented Nov 1, 2024

weiliu1031 commented Nov 1, 2024

weiliu1031 commented Nov 1, 2024

weiliu1031 commented Nov 1, 2024

weiliu1031 commented Nov 3, 2024

weiliu1031 commented Nov 3, 2024

mergify bot commented Nov 3, 2024

weiliu1031 commented Nov 4, 2024

mergify bot commented Nov 4, 2024

weiliu1031 commented Nov 4, 2024

mergify bot commented Nov 4, 2024

weiliu1031 commented Nov 4, 2024

mergify bot commented Nov 4, 2024

weiliu1031 commented Nov 4, 2024

mergify bot commented Nov 4, 2024

weiliu1031 commented Nov 4, 2024

czs007 commented Nov 5, 2024

sre-ci-robot commented Nov 5, 2024

fix: Search/Query may failed during updating delegator cache. #37116

fix: Search/Query may failed during updating delegator cache. #37116

Conversation

weiliu1031 commented Oct 24, 2024 • edited Loading

mergify bot commented Oct 24, 2024

mergify bot commented Oct 24, 2024

weiliu1031 commented Oct 24, 2024

codecov bot commented Oct 24, 2024 • edited Loading

Codecov Report

weiliu1031 commented Oct 28, 2024

weiliu1031 commented Oct 28, 2024

mergify bot commented Oct 28, 2024

weiliu1031 commented Oct 28, 2024

mergify bot commented Oct 28, 2024

mergify bot commented Oct 28, 2024

weiliu1031 commented Oct 28, 2024

mergify bot commented Oct 28, 2024

weiliu1031 commented Oct 28, 2024

mergify bot commented Oct 28, 2024

weiliu1031 commented Oct 29, 2024

mergify bot commented Oct 29, 2024

mergify bot commented Nov 1, 2024

weiliu1031 commented Nov 1, 2024

weiliu1031 commented Nov 1, 2024

weiliu1031 commented Nov 1, 2024

weiliu1031 commented Nov 3, 2024

weiliu1031 commented Nov 3, 2024

mergify bot commented Nov 3, 2024

weiliu1031 commented Nov 4, 2024

mergify bot commented Nov 4, 2024

weiliu1031 commented Nov 4, 2024

mergify bot commented Nov 4, 2024

weiliu1031 commented Nov 4, 2024

mergify bot commented Nov 4, 2024

weiliu1031 commented Nov 4, 2024

mergify bot commented Nov 4, 2024

weiliu1031 commented Nov 4, 2024

czs007 commented Nov 5, 2024

sre-ci-robot commented Nov 5, 2024

weiliu1031 commented Oct 24, 2024 •

edited

Loading

codecov bot commented Oct 24, 2024 •

edited

Loading