Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage raise about 4 GB when there are stale snapshot #5084

Open
JaySon-Huang opened this issue Jun 7, 2022 · 8 comments
Open

Memory usage raise about 4 GB when there are stale snapshot #5084

JaySon-Huang opened this issue Jun 7, 2022 · 8 comments
Assignees
Labels

Comments

@JaySon-Huang
Copy link
Contributor

JaySon-Huang commented Jun 7, 2022

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

# build and patch tiflash with failpoint enabled release version
> cmake -DCMAKE_BUILD_TYPE=RELWITHDBGINFO -DENABLE_FAILPOINTS=ON -GNinja ..
# use br to load chbenchmark 1500 from internal minio service

# enable failpoint for holding the snapshot for a while
> LD_LIBRARY_PATH=. ./tiflash client --host ${tiflash_ip} --port ${tiflash_tcp_port} --query 'DBGInvoke enable_fail_point(force_slow_page_storage_snapshot_release)'
> LD_LIBRARY_PATH=. ./tiflash client --host ${tiflash_ip} --port ${tiflash_tcp_port} --query 'selraw count(*) from db_69.t_79'
471415523

fiu_do_on(FailPoints::force_slow_page_storage_snapshot_release, {
std::thread thread_hold_snapshots([tasks]() {
std::this_thread::sleep_for(std::chrono::seconds(5 * 60));
(void)tasks;
});
thread_hold_snapshots.detach();
});

2. What did you expect to see? (Required)

After the snapshot created, the memory usage do no change significantly

3. What did you see instead (Required)

After the snapshot created, the memory usage raise about 4 GB, without any write operations.
After the snapshot released, the memory usage will drop to normal level.

http://172.16.5.81:21510/d/SVbh2xUWk/jayson-tiflash-summary?orgId=1&from=1654597201960&to=1654599606104
image

4. What is your TiFlash version? (Required)

https://github.com/JaySon-Huang/tiflash/commits/debug_ps base on e3a4412

@JaySon-Huang
Copy link
Contributor Author

JaySon-Huang commented Jun 7, 2022

And the raise of memory is effect by the number of segments (or the amount of data) being held by the snapshot.

mysql> select segment_id from information_schema.tiflash_segments  where tidb_database = 'chbenchmark' and tidb_table='order_line' and tiflash_instance='172.16.5.82:9512' order by segment_id;

> LD_LIBRARY_PATH=. ./tiflash client --host 172.16.5.82 --port 5012 --query 'selraw count(*) from db_69.t_79 segment (1)'
> LD_LIBRARY_PATH=. ./tiflash client --host 172.16.5.82 --port 5012 --query 'selraw count(*) from db_69.t_79 segment (1,125,128,131,134,137,140,143,146,150)'
17663853
> LD_LIBRARY_PATH=. ./tiflash client --host 172.16.5.82 --port 5012 --query 'selraw count(*) from db_69.t_79 segment (1,125,128,131,134,137,140,143,146,150,152,155,158,161,164,167,170,173,176,179,182,185,188,191,194,197,200,204,206,209,212,215,218,221,224,227,230,233,236,239,242,245,248,251,254,257,261,263,266,269,272,275,278,281,284,287,290,293,296,299,302,305,308,311,314,318,320,323,326,329,332,335,338,341,344,347,350,353,356,360,362,365,368,371,374,377,380,383,386,389,392,395,398,401,404,407,410,413,416,419)'
179308973
> LD_LIBRARY_PATH=. ./tiflash client --host 172.16.5.82 --port 5012 --query 'selraw count(*) from db_69.t_79 segment (1,125,128,131,134,137,140,143,146,150,152,155,158,161,164,167,170,173,176,179,182,185,188,191,194,197,200,204,206,209,212,215,218,221,224,227,230,233,236,239,242,245,248,251,254,257,261,263,266,269,272,275,278,281,284,287,290,293,296,299,302,305,308,311,314,318,320,323,326,329,332,335,338,341,344,347,350,353,356,360,362,365,368,371,374,377,380,383,386,389,392,395,398,401,404,407,410,413,416,419,422,425,428,431,434,437,440,443,446,449,452,455,458,461,464,468,470,473,476,479,482,485,488,491,494,497,500,504,506,509,512,515,518,521,525,527,530,533,536,539,542,545,548,551,554,557,560,563,567,569,572,575,578,581,584,587,590,593,596,599,602,605,608,611,614,617,620,623,626,629,632,635,638,641,647,650,653,656,659,662,665,668,671,674,677,680,683,686,689,692,695,698,701,704,707,710,713,717,719,722)'
370252319

image

@JaySon-Huang
Copy link
Contributor Author

Deploy a tiflash node with storage_format=3, and the problem still exist. Indicating that it is not related to PageStorageV3.

image
image

@JaySon-Huang
Copy link
Contributor Author

also reproduced in v5.4.1

image

@hehechen
Copy link
Contributor

The setting dt_enable_stable_column_cache is true by default, so the handle and version column will be cached in stable snapshot. I set dt_enable_stable_column_cache to false and the memory usage didn't raise.

@hehechen
Copy link
Contributor

It's no need to cache the handle and version column in readraw since readraw don't place delta index, so I think we can disable the dt_enable_stable_column_cache in readraw.

@lidezhu
Copy link
Contributor

lidezhu commented Jun 21, 2022

It's no need to cache the handle and version column in readraw since readraw don't place delta index, so I think we can disable the dt_enable_stable_column_cache in readraw.

But readRaw should only be used through tiflash client, so it's seems not necessary to optimize it.

@lidezhu
Copy link
Contributor

lidezhu commented Jun 21, 2022

The setting dt_enable_stable_column_cache is true by default, so the handle and version column will be cached in stable snapshot. I set dt_enable_stable_column_cache to false and the memory usage didn't raise.

It seems the memory increase is caused by dt_enable_stable_column_cache.

@hehechen hehechen self-assigned this Jul 18, 2022
@hehechen
Copy link
Contributor

There is no such problem in the read process of processing requests from TiDB, because each time a segment is read, the corresponding snapshot will be released.
But there is one place that can be optimized. In scenarios where place index is not needed, cache is not needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants