-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
checker(dm): check the number of connections before starting #5185
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
Will add ut later |
Codecov Report
Flags with carried forward coverage won't be shown. Click here to find out more. @@ Coverage Diff @@
## master #5185 +/- ##
================================================
+ Coverage 59.4214% 59.4633% +0.0419%
================================================
Files 763 767 +4
Lines 87418 87760 +342
================================================
+ Hits 51945 52185 +240
- Misses 30894 30969 +75
- Partials 4579 4606 +27 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
restLGTM
dm/checker/checker.go
Outdated
if len(c.stCfgs) > 0 && len(c.instances) > 0 { | ||
switch c.stCfgs[0].Mode { | ||
case config.ModeAll: | ||
c.checkList = append(c.checkList, checker.NewLoaderConnAmountCheker(c.instances[0].targetDB, c.stCfgs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as this check is somthing about connection count, do we need to put this item in the head of the check queue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checker list is parallel. So it doesn't matter.
tiflow/dm/pkg/checker/real_checker.go
Lines 131 to 139 in 536e182
for i, checker := range checkers { | |
wg.Add(1) | |
go func(i int, checker RealChecker) { | |
defer wg.Done() | |
result := checker.Check(ctx) | |
result.ID = uint64(i) | |
resultCh <- result | |
}(i, checker) | |
} |
dm/checker/checker.go
Outdated
if len(c.stCfgs) > 0 && len(c.instances) > 0 { | ||
switch c.stCfgs[0].Mode { | ||
case config.ModeAll: | ||
c.checkList = append(c.checkList, checker.NewLoaderConnAmountCheker(c.instances[0].targetDB, c.stCfgs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checker list is parallel. So it doesn't matter.
tiflow/dm/pkg/checker/real_checker.go
Lines 131 to 139 in 536e182
for i, checker := range checkers { | |
wg.Add(1) | |
go func(i int, checker RealChecker) { | |
defer wg.Done() | |
result := checker.Check(ctx) | |
result.ID = uint64(i) | |
resultCh <- result | |
}(i, checker) | |
} |
dm/pkg/checker/conn_checker.go
Outdated
return result | ||
} | ||
var rows *sql.Rows | ||
rows, err = baseConn.QuerySQL(tcontext.NewContext(ctx, log.L()), "SHOW VARIABLES LIKE '%max_connections%'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can reuse func GetMaxConnections
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But GetMaxConnections
has no retry strategy. Users have to retry by themselves if connection errors occur (#5015).
Actually, there are a bunch of different pkgs used for querying SQL, varying from implementation and retry strategy. BaseConn
is good to have decent retry strategy as well as encapsulation. Should we refactor and integrate them as a whole package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, @lance6716 is planning to refactor the syncer. Idk if this could be part of the plan cuz it seems like a big one to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a checker for dumpling?
dm/pkg/checker/conn_checker.go
Outdated
func NewLoaderConnAmountCheker(targetDB *conn.BaseDB, stCfgs []*config.SubTaskConfig) RealChecker { | ||
return &LoaderConnAmountChecker{ | ||
connAmountChecker: newConnAmountChecker(targetDB, stCfgs, func(stCfgs []*config.SubTaskConfig) int { | ||
loaderConn := 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loader's checkpoint db will use some connections. Does this part need to be calculated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will have a look at it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every unit maintains 1 db connection for checkpoint. I added it and left a comment. PTAL again😀
/run-verify |
/run-dm-integration-test |
1 similar comment
/run-dm-integration-test |
dm/pkg/checker/conn_checker.go
Outdated
syncerConn := 0 | ||
for _, stCfg := range stCfgs { | ||
// syncer's worker and checkpoint (always keeps one db connection) | ||
syncerConn += stCfg.SyncerConfig.WorkerCount + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe syncer has a DDL connection?
dm/tests/dmctl_command/run.sh
Outdated
@@ -142,6 +145,39 @@ function run() { | |||
fi | |||
} | |||
|
|||
function test_full_mode_conn() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please check the actual connection is same as you expected by SHOW PROCESSLIST
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean I should start-task
and see if they are as expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. if you do this check, you can find the forgotten checkpoint / DDL / dump control connections on your own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong action
please open an issue for them. I'm afraid we will forget them. |
|
shardddl3-1 /run-dm-integration-test |
similar to this: #3461 |
/run-dm-integration-test |
shardddl3-1 again /run-dm-integration-test |
/run-dm-integration-test |
/run-leak-test |
checkpoint_transaction
|
this is an acceptable error, not the root cause. please open an issue |
@buchuitoudegou: Your PR was out of date, I have automatically updated it for you. At the same time I will also trigger all tests for you: /run-all-tests If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
/run-verify |
1 similar comment
/run-verify |
In response to a cherrypick label: cannot checkout |
In response to a cherrypick label: cannot checkout |
Signed-off-by: ti-chi-bot <[email protected]>
In response to a cherrypick label: new pull request created: #6535. |
Signed-off-by: ti-chi-bot <[email protected]>
In response to a cherrypick label: new pull request created: #6536. |
In response to a cherrypick label: new pull request created: #6537. |
Signed-off-by: ti-chi-bot <[email protected]>
In response to a cherrypick label: cannot checkout |
What problem does this PR solve?
Issue Number: close #5005
What is changed and how it works?
Check List
Tests
Code changes
Side effects
Related changes
Release note