-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PageStorage: Fix peak memory usage when running GC on PageDirectory #6168
PageStorage: Fix peak memory usage when running GC on PageDirectory #6168
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
/run-all-tests |
Unit-tests are added, but still need some fullstack tests and polish the metrics |
89c6da3
to
1d0fcfd
Compare
/run-all-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Metrics (grafana panels with this PR: #6175): |
/merge |
@JaySon-Huang: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests You only need to trigger If you have any questions about the PR merge process, please refer to pr process. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: a8d677d
|
@JaySon-Huang: Your PR was out of date, I have automatically updated it for you. At the same time I will also trigger all tests for you: /run-all-tests If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
/run-unit-test |
In response to a cherrypick label: new pull request created: #6184. |
Signed-off-by: ti-chi-bot <[email protected]>
What problem does this PR solve?
Issue Number: close #6163
Problem Summary:
The snapshot file getting bigger and bigger, causing peak memory usage after TiFlash running for a long time.
Introduced by #5357
#5357 want to resolve a bug that cause by concurrency between deleting a page and Full GC. But that fixes will make the entries in PageDirectory snapshot file getting bigger and bigger. Cause it can not ensure whether there are more edit on the pages with a
FileSnapshot
What is changed and how it works?
When running into compact log-files, we freeze the current writing log-file. Because compact log-files and full gc won't run concurrently, once we freeze the current writing log-file, we can get all
upsert
that may happen afterdel
for the same page id. So the new compacted log files can correctly remove the page id.Another approach that dose not work
Another approach is to avoid persisting a "upsert" after "del". However, because PageStorage support RefPages, it is hard to avoid this behavior.For example, if we want to avoid persisting the wrong "upsert", we can check whether the page is deleted when the upsert entry edit is being committed to PageDirectory. If the page is deleted, then rollback the entries edit for this page id.
However, PageStorage supports RefPages. If one page itself is marked as deleted, but still being ref by another page id. We still need to commit the upsert entry, or "full gc" can not reduce space amplification as expected.
Check List
Tests
Run the
HeavySkewWriteRead
testing pageworkload (based on #6178), we can see that before this PR, the wal snapshot file getting bigger and bigger as it does not remove the deleted page from snapshot file. After this PR, the wal snapshot file only keep the available page.Before this PR
After this PR
Side effects
Documentation
Release note