Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mysql (ticdc): Improve the performance of the mysql sink by refining the transaction event batching logic #10466

Merged
merged 14 commits into from
Jan 16, 2024

Conversation

hongyunyan
Copy link
Collaborator

@hongyunyan hongyunyan commented Jan 15, 2024

What problem does this PR solve?

Issue Number: close #11241

What is changed and how it works?

  1. When there are a lot of txns in worker's txnCh, we try to do flush until save batches to maxTxnRows, instead of doing flush with tiny txns because of ticker.C.
  2. Adjust the calculation of worker-busy-ratio to make it more precise and more clear.
    The original calculation of worker-busy-ratio may have certain deviation when the flush time is long (such as larger than hundreds of milliseconds). And the longer the flush time, the larger the deviation. The reason for the deviation is that when the flsuh time is long, the interval between each time we increase the worker busy ratio value clearly exceeds 1s, resulting in the growth rate of the worker-busy-ratio per second being less than actual rate.

Performance Test Result

We take a large workload on upstream to keep worker-busy-ratio as 100%, and compare the improvement of sink performance before and after optimization.

  1. Simulate network latency between the upstream and downstream is 2ms.
    After Optimization -- 33339 rows/s (+14%). Before Optimization -- 29075 rows/s
  2. Simulate network latency between the upstream and downstream is 5ms.
    After Optimization -- 20528 rows/s(+13%). Before Optimization -- 18129 rows/s
  3. Simulate network latency between the upstream and downstream is 20ms.
    After Optimization -- 7273 rows/s(+20%). Before Optimization -- 6048 rows/s
    image

New panel of worker busy ratio

img_v3_0274_02f665a8-9887-4a35-a4f7-9dceaa8d497g

Check List

Tests

  • Manual test (add detailed scripts or steps below)

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

None

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 15, 2024
@ti-chi-bot ti-chi-bot bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 15, 2024
@hongyunyan hongyunyan changed the title cdc: Fix mysql sink batch processing to improvment sink performance WIP: cdc: Fix mysql sink batch processing to improvment sink performance Jan 15, 2024
@ti-chi-bot ti-chi-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 15, 2024
@hongyunyan hongyunyan changed the title WIP: cdc: Fix mysql sink batch processing to improvment sink performance cdc: Fix mysql sink batch processing to improvment sink performance Jan 15, 2024
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 15, 2024
@asddongmen asddongmen changed the title cdc: Fix mysql sink batch processing to improvment sink performance mysql (ticdc): Improve the performance of the mysql sink by refining the transaction event batching logic Jan 15, 2024
if txn.txnEvent != nil {
needFlush = w.onEvent(txn)
if !needFlush {
Copy link
Contributor

@CharlesCheung96 CharlesCheung96 Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the core idea here is to ensure that the flush interval is greater than 10ms. Maybe we could record lastFlushTime at the end of each flush and check it in each ticker? such as:

case <-ticker.C:
    if time.Since(lastFlushTime) >= w.flushInterval {
    	needFlush = true
    }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe both of these two methods are feasible, and I'm wondering if the current code might be a bit more straightforward and easier to understand?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, the nested logic is more complex.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok here orz

Copy link

codecov bot commented Jan 15, 2024

Codecov Report

Merging #10466 (7d8bd0b) into master (3e9c9de) will decrease coverage by 5.9918%.
Report is 296 commits behind head on master.
The diff coverage is 53.8765%.

Additional details and impacted files
Components Coverage Δ
cdc 61.6267% <53.8765%> (-1.7802%) ⬇️
dm 51.2640% <ø> (∅)
engine 63.4141% <ø> (∅)
Flag Coverage Δ
unit 57.4151% <53.8765%> (-5.9918%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@@               Coverage Diff                @@
##             master     #10466        +/-   ##
================================================
- Coverage   63.4069%   57.4151%   -5.9918%     
================================================
  Files           392        849       +457     
  Lines         51067     125864     +74797     
================================================
+ Hits          32380      72265     +39885     
- Misses        16385      48192     +31807     
- Partials       2302       5407      +3105     

@hongyunyan
Copy link
Collaborator Author

/test verify

@hicqu
Copy link
Contributor

hicqu commented Jan 16, 2024

/retest

@ti-chi-bot ti-chi-bot bot merged commit 50d96a6 into pingcap:master Jan 16, 2024
24 of 28 checks passed
CharlesCheung96 pushed a commit to ti-chi-bot/tiflow that referenced this pull request Apr 26, 2024
CharlesCheung96 pushed a commit to ti-chi-bot/tiflow that referenced this pull request Apr 26, 2024
@hongyunyan
Copy link
Collaborator Author

/cherry-pick release-6.5

@ti-chi-bot
Copy link
Member

@hongyunyan: new pull request created to branch release-6.5: #11242.

In response to this:

/cherry-pick release-6.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@hongyunyan
Copy link
Collaborator Author

/cherry-pick 7.1

@hongyunyan
Copy link
Collaborator Author

/cherry-pick release-7.1

@ti-chi-bot ti-chi-bot added the needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. label Jun 4, 2024
@ti-chi-bot
Copy link
Member

@hongyunyan: cannot checkout 7.1: error checking out 7.1: exit status 1. output: error: pathspec '7.1' did not match any file(s) known to git

In response to this:

/cherry-pick 7.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

@hongyunyan: new pull request created to branch release-7.1: #11244.

In response to this:

/cherry-pick release-7.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request could not be created: failed to create pull request against pingcap/tiflow#release-7.1 from head ti-chi-bot:cherry-pick-10466-to-release-7.1: status code 422 not one of [201], body: {"message":"Validation Failed","errors":[{"resource":"PullRequest","code":"custom","message":"A pull request already exists for ti-chi-bot:cherry-pick-10466-to-release-7.1."}],"documentation_url":"https://docs.github.com/rest/pulls/pulls#create-a-pull-request"}

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Jun 4, 2024
@ti-chi-bot ti-chi-bot added needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. and removed needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. labels Jun 4, 2024
ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Jun 4, 2024
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #11248.

ti-chi-bot bot pushed a commit that referenced this pull request Jun 5, 2024
ti-chi-bot bot pushed a commit that referenced this pull request Jun 5, 2024
hicqu added a commit to ti-chi-bot/tiflow that referenced this pull request Jun 12, 2024
commit c092599
Author: Ti Chi Robot <[email protected]>
Date:   Wed Jun 12 00:26:59 2024 +0800

    pkg/config, sink(ticdc): support output raw change event for mq and cloud storage sink (pingcap#11226) (pingcap#11290)

    close pingcap#11211

commit 3426e46
Author: Ti Chi Robot <[email protected]>
Date:   Tue Jun 11 19:40:29 2024 +0800

    puller(ticdc): fix wrong update splitting behavior after table scheduling (pingcap#11269) (pingcap#11282)

    close pingcap#11219

commit 2a28078
Author: Ti Chi Robot <[email protected]>
Date:   Tue Jun 11 16:40:37 2024 +0800

    mysql(ticdc): remove error filter when check isTiDB in backend init (pingcap#11214) (pingcap#11261)

    close pingcap#11213

commit 2425d54
Author: Ti Chi Robot <[email protected]>
Date:   Tue Jun 11 16:40:30 2024 +0800

    log(ticdc): Add more error query information to the returned error to facilitate users to know the cause of the failure (pingcap#10945) (pingcap#11257)

    close pingcap#11254

commit 053cdaf
Author: Ti Chi Robot <[email protected]>
Date:   Tue Jun 11 15:34:30 2024 +0800

    cdc: log slow conflict detect every 60s (pingcap#11251) (pingcap#11287)

    close pingcap#11271

commit 327ba7b
Author: Ti Chi Robot <[email protected]>
Date:   Tue Jun 11 11:42:00 2024 +0800

    redo(ticdc): return internal error in redo writer (pingcap#11011) (pingcap#11091)

    close pingcap#10124

commit d82ae89
Author: Ti Chi Robot <[email protected]>
Date:   Mon Jun 10 22:28:29 2024 +0800

    ddl_puller (ticdc): handle dorp pk/uk ddl correctly (pingcap#10965) (pingcap#10981)

    close pingcap#10890

commit f15bec9
Author: Ti Chi Robot <[email protected]>
Date:   Fri Jun 7 16:16:28 2024 +0800

    redo(ticdc): enable pprof and set memory limit for redo applier (pingcap#10904) (pingcap#10996)

    close pingcap#10900

commit ba50a0e
Author: Ti Chi Robot <[email protected]>
Date:   Wed Jun 5 19:58:26 2024 +0800

    test(ticdc): enable sequence test (pingcap#11023) (pingcap#11037)

    close pingcap#11015

commit 94b9897
Author: Ti Chi Robot <[email protected]>
Date:   Wed Jun 5 17:08:56 2024 +0800

    mounter(ticdc): timezone fill default value should also consider tz. (pingcap#10932) (pingcap#10946)

    close pingcap#10931

commit a912d33
Author: Ti Chi Robot <[email protected]>
Date:   Wed Jun 5 10:49:25 2024 +0800

    mysql (ticdc): Improve the performance of the mysql sink by refining the transaction event batching logic (pingcap#10466) (pingcap#11242)

    close pingcap#11241

commit 6277d9a
Author: dongmen <[email protected]>
Date:   Wed May 29 20:13:22 2024 +0800

    kvClient (ticdc): revert e5999e3 to remove useless metrics (pingcap#11184)

    close pingcap#11073

commit 54e93ed
Author: dongmen <[email protected]>
Date:   Wed May 29 17:43:22 2024 +0800

    syncpoint (ticdc): make syncpoint support base64 encoded password (pingcap#11162)

    close pingcap#10516

commit 0ba9329
Author: Ti Chi Robot <[email protected]>
Date:   Wed May 29 09:07:21 2024 +0800

    (redo)ticdc: fix the event orderliness in redo log (pingcap#11117) (pingcap#11180)

    close pingcap#11096

Signed-off-by: qupeng <[email protected]>
ti-chi-bot bot pushed a commit that referenced this pull request Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ticdc: Enhance the batching capability of txnSink to improve the mysql sink performance
5 participants