Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serve] prevent in memory metric store in handles from growing in memory #44877

Merged
merged 1 commit into from
Apr 23, 2024

Conversation

zcin
Copy link
Contributor

@zcin zcin commented Apr 19, 2024

[serve] prevent in memory metric store in handles from growing in memory

There are two potential sources of memory leak for the InMemoryMetricsStore in the handles that's used to record/report autoscaling metrics:

  1. Old replica ID keys are never removed. We remove old replica keys from num_queries_sent_to_replicas when we get an updated list of running replicas from the long poll update, but we don't do any such cleaning for the in memory metrics store. This means there is leftover, uncleaned data for replicas that are no longer running.
  2. We don't delete data points recorded from more than look_back_period_s ago for replicas except during window avg queries. This should mostly be solved once (1) is solved because this should only be a problem for replicas that are no longer running.

This PR addresses (1) and (2) by periodically

  • pruning keys that haven't had updated data points in the past look_back_period_s.
  • compacting datapoints that are more than look_back_period_s old

Main benchmarks picked from the full microbenchmark run posted below in the comments:

metric master current changes % change
http_p50_latency 11.082594282925129 11.626139283180237 4.9044924534731305
http_1mb_p50_latency 11.81719359010458 12.776304967701435 8.116236484439
http_10mb_p50_latency 17.57313683629036 18.03796272724867 2.6450934473940757
http_avg_rps 204.2 195.04 -4.48579823702252
grpc_p50_latency 7.965719327330589 8.844093419611454 11.026927465
grpc_1mb_p50_latency 17.652496695518494 19.921275787055492 12.852454418603475
grpc_10mb_p50_latency 142.39510521292686 153.88561598956585 8.069456291673038
grpc_avg_rps 203.35 211.01 3.766904352102296
handle_p50_latency 4.890996962785721 4.082906059920788 -16.522007864929765
handle_1mb_p50_latency 11.582874692976475 10.905216448009014 -5.8505186573275525
handle_10mb_p50_latency 65.54202642291784 67.52330902963877 3.0229193615962657
handle_avg_rps 394.57 404.85 2.6053678688192194

There is no performance degradation in latencies or throughput. All benchmarks were run with autoscaling turned on (instead of num_replicas=1, I just set autoscaling_config={"min_replicas": 1, "max_replicas": 1})

Closes #44870.

Signed-off-by: Cindy Zhang [email protected]

@zcin zcin force-pushed the pr44877 branch 2 times, most recently from 74e4301 to e5acb10 Compare April 20, 2024 01:04
@zcin zcin marked this pull request as ready for review April 20, 2024 01:04
@zcin zcin requested a review from edoakes April 20, 2024 01:04
@zcin zcin force-pushed the pr44877 branch 4 times, most recently from 96a4a86 to 2f0a091 Compare April 22, 2024 18:52
Comment on lines 139 to 149
def prune_data(self, start_timestamp_s: float):
"""Prune keys that haven't had new data recorded after start_timestamp_s."""
for key, datapoints in list(self.data.items()):
if len(datapoints) == 0 or datapoints[-1].timestamp < start_timestamp_s:
del self.data[key]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's probably a better datastructure we can be using for this, e.g., use numpy arrays instead of lists numpy

but for now this is very unlikely to be a bottleneck

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: name it "prune_keys" to make it clear that this does not prune datapoints that are outside of the window if a key is still active

Comment on lines -234 to +238
def process_finished_request(self, replica_id: ReplicaID, *args):
def dec_num_running_requests_for_replica(self, replica_id: ReplicaID, *args):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[non blocking] It's a little worrisome to me that we now have two separate GC paths, one to prune outdated datapoints and one to prune keys with no datapoints. Is it possible to combine them into the same path w/o any performance degradation?

@zcin zcin force-pushed the pr44877 branch 3 times, most recently from 505f57e to 56834d0 Compare April 22, 2024 22:43
@zcin
Copy link
Contributor Author

zcin commented Apr 22, 2024

Results from running python workloads/microbenchmarks.py on a linux devbox:

metric master current changes % change
http_p50_latency 11.689163744449615 12.697989121079445 8.630432413172851
http_p90_latency 17.667977139353752 20.538362674415115 16.246260182598093
http_p95_latency 20.711670164018855 23.694146424531922 14.399979513454909
http_p99_latency 29.65034229680895 29.68140149489045 0.10475156667868468
http_1mb_p50_latency 12.98057846724987 12.922706082463264 -0.4458382569976993
http_1mb_p90_latency 21.109187789261345 20.95024958252907 -0.7529337856055696
http_1mb_p95_latency 24.55082470551133 23.634034581482396 -3.7342538795575697
http_1mb_p99_latency 32.392665147781365 29.604270085692402 -8.60810633940673
http_10mb_p50_latency 17.51419436186552 18.64500530064106 6.456539852256671
http_10mb_p90_latency 26.162455044686794 28.389364294707775 8.51185122427276
http_10mb_p95_latency 29.105186089873314 31.291070766746994 7.510292736572555
http_10mb_p99_latency 35.415677595883594 38.92970275133845 9.922230475305938
http_avg_rps 189.59 193.57 2.0992668389683056
http_throughput_std 10.87 12.29 13.063477460901574
grpc_p50_latency 11.873401701450348 8.261814713478088 -30.417458103275496
grpc_p90_latency 18.791097030043602 13.284517452120783 -29.304194263479044
grpc_p95_latency 21.959066577255722 15.328398626297712 -30.195581982641183
grpc_p99_latency 29.742918573319905 21.216103378683307 -28.66838764870019
grpc_1mb_p50_latency 21.94363623857498 19.259278662502766 -12.232966072201613
grpc_1mb_p90_latency 36.9620030745864 30.142956227064136 -18.448802230122553
grpc_1mb_p95_latency 42.1659248881042 32.7460279688239 -22.340069485675684
grpc_1mb_p99_latency 53.054883759468794 41.461732257157564 -21.851242865539554
grpc_10mb_p50_latency 143.66386830806732 141.6999576613307 -1.3670177963782004
grpc_10mb_p90_latency 179.3738016858697 175.6876302883029 -2.0550221732057894
grpc_10mb_p95_latency 194.27295429632062 190.28351670131084 -2.0535218653877862
grpc_10mb_p99_latency 216.17326522246 214.93708996102214 -0.5718446544098477
grpc_avg_rps 187.63 205.5 9.52406331610085
grpc_throughput_std 17.96 12.56 -30.066815144766146
handle_p50_latency 4.922645166516304 4.540378227829933 -7.765478228789268
handle_p90_latency 9.407674893736841 8.274185843765737 -12.048556766409135
handle_p95_latency 11.222601961344473 9.853483550250528 -12.199652235816483
handle_p99_latency 15.578116625547406 13.329956158995623 -14.43152930865148
handle_1mb_p50_latency 11.975042521953583 9.734918363392353 -18.70660713274679
handle_1mb_p90_latency 17.783071100711826 15.391851216554642 -13.446608128679571
handle_1mb_p95_latency 20.919007528573268 18.27068757265806 -12.659873812358779
handle_1mb_p99_latency 28.804545234888774 23.77186611294746 -17.47182286997404
handle_10mb_p50_latency 68.21819301694632 68.3914739638567 0.25400987514772044
handle_10mb_p90_latency 88.46968635916711 90.04715215414762 1.783057971491342
handle_10mb_p95_latency 98.49190469831224 95.55206522345537 -2.984853916534369
handle_10mb_p99_latency 116.04135598987341 117.39470710977912 1.1662662060100581
handle_avg_rps 348.2 377.13 8.308443423319932
handle_throughput_std 35.27 23.24 -34.1083073433513

@zcin
Copy link
Contributor Author

zcin commented Apr 22, 2024

Comparing results master, and the basic fix + combining the GC into a single path:

metric master basic fix + combining GC % change
http_p50_latency 11.082594282925129 11.626139283180237 4.9044924534731305
http_p90_latency 17.237211205065254 17.71053411066532 2.7459366829651666
http_p95_latency 19.190100394189354 20.124413166195147 4.868722689375282
http_p99_latency 22.697068899869915 25.29432920739053 11.44315294181224
http_1mb_p50_latency 11.81719359010458 12.776304967701435 8.116236484439
http_1mb_p90_latency 18.78219097852707 19.593118876218796 4.3175362161891995
http_1mb_p95_latency 21.54928250238299 22.788667865097523 5.751399669930901
http_1mb_p99_latency 32.356874477118254 26.531913559883822 -18.002236035973308
http_10mb_p50_latency 17.57313683629036 18.03796272724867 2.6450934473940757
http_10mb_p90_latency 25.324736163020134 26.30617953836918 3.875433761802327
http_10mb_p95_latency 28.40709909796714 30.150835309177623 6.138381836163154
http_10mb_p99_latency 35.19434265792368 37.684448473155484 7.075301389870359
http_avg_rps 204.2 195.04 -4.48579823702252
http_throughput_std 10.59 10.05 -5.099150141643049
grpc_p50_latency 7.965719327330589 9.226929396390915 11.026927465
grpc_p90_latency 12.888458557426931 16.000223346054558 24.143808778703658
grpc_p95_latency 15.0979402475059 19.054750353097894 26.207615348362758
grpc_p99_latency 21.112139746546735 25.18679138273 19.300041043209504
grpc_1mb_p50_latency 17.652496695518494 19.921275787055492 12.852454418603475
grpc_1mb_p90_latency 24.58920944482088 28.302832320332527 15.102652583626796
grpc_1mb_p95_latency 27.62148445472119 32.11428858339785 16.26561431208209
grpc_1mb_p99_latency 34.32560721412301 40.012291837483644 16.566887186837278
grpc_10mb_p50_latency 142.39510521292686 153.88561598956585 8.069456291673038
grpc_10mb_p90_latency 171.29450254142284 196.54369726777077 14.740224789316937
grpc_10mb_p95_latency 180.62871610745782 208.2955297082662 15.316951920506995
grpc_10mb_p99_latency 203.08018665760756 228.67932429537174 12.605433380324914
grpc_avg_rps 203.35 211.01 3.766904352102296
grpc_throughput_std 11.9 18.37 54.36974789915967
handle_p50_latency 4.890996962785721 4.082906059920788 -16.522007864929765
handle_p90_latency 8.330053836107256 7.066779211163522 -15.165263632126525
handle_p95_latency 10.292354132980105 8.812354505062096 -14.379602652571
handle_p99_latency 14.295727964490647 11.079099290072916 -22.500628736134033
handle_1mb_p50_latency 11.582874692976475 10.905216448009014 -5.8505186573275525
handle_1mb_p90_latency 17.04296506941319 16.72532092779875 -1.8637845018206867
handle_1mb_p95_latency 19.410823285579664 18.967161420732737 -2.2856416666083623
handle_1mb_p99_latency 22.920265067368735 26.537567153573033 15.782112796567095
handle_10mb_p50_latency 65.54202642291784 67.52330902963877 3.0229193615962657
handle_10mb_p90_latency 83.52929335087538 84.07336305826902 0.6513519815236624
handle_10mb_p95_latency 90.69845974445342 93.42181514948605 3.0026479090227154
handle_10mb_p99_latency 106.92971816286443 104.65163111686707 -2.1304526797009005
handle_avg_rps 394.57 404.85 2.6053678688192194
handle_throughput_std 22.09 23.86 8.01267541874151

@zcin
Copy link
Contributor Author

zcin commented Apr 23, 2024

[non blocking] It's a little worrisome to me that we now have two separate GC paths, one to prune outdated datapoints and one to prune keys with no datapoints. Is it possible to combine them into the same path w/o any performance degradation?

@edoakes I've combined them. It doesn't seem like there's any noticeable performance degradation.

@zcin zcin self-assigned this Apr 23, 2024
@zcin zcin force-pushed the pr44877 branch 4 times, most recently from d9fb461 to 6ff3878 Compare April 23, 2024 17:58
There are two potential sources of memory leak for the `InMemoryMetricsStore` in the handles that's used to record/report autoscaling metrics:
1. Old replica ID keys are never removed. We remove old replica keys from `num_queries_sent_to_replicas` when we get an updated list of running replicas from the long poll update, but we don't do any such cleaning for the in memory metrics store. This means there is leftover, uncleaned data for replicas that are no longer running.
2. We don't delete data points recorded from more than `look_back_period_s` ago for replicas except during window avg queries. This should mostly be solved once (1) is solved because this should only be a problem for replicas that are no longer running.

This PR addresses (1) and (2) by periodically
* pruning keys that haven't had updated data points in the past `look_back_period_s`.
* compacting datapoints that are more than `look_back_period_s` old

Main benchmarks picked from the full microbenchmark run posted below in the comments:
|metric| master | current changes | % change |
|------|---|---| -------- |
|http_p50_latency|11.082594282925129|11.626139283180237|4.9044924534731305|
|http_1mb_p50_latency|11.81719359010458|12.776304967701435|8.116236484439|
|http_10mb_p50_latency|17.57313683629036|18.03796272724867|2.6450934473940757|
|http_avg_rps|204.2|195.04|-4.48579823702252|
|grpc_p50_latency|7.965719327330589|8.844093419611454|11.026927465|
|grpc_1mb_p50_latency|17.652496695518494|19.921275787055492|12.852454418603475|
|grpc_10mb_p50_latency|142.39510521292686|153.88561598956585|8.069456291673038|
|grpc_avg_rps|203.35|211.01|3.766904352102296|
|handle_p50_latency|4.890996962785721|4.082906059920788|-16.522007864929765|
|handle_1mb_p50_latency|11.582874692976475|10.905216448009014|-5.8505186573275525|
|handle_10mb_p50_latency|65.54202642291784|67.52330902963877|3.0229193615962657|
|handle_avg_rps|394.57|404.85|2.6053678688192194|

There is no performance degradation in latencies or throughput. All benchmarks were run with autoscaling turned on (instead of `num_replicas=1`, I just set `autoscaling_config={"min_replicas": 1, "max_replicas": 1}`)

Closes ray-project#44870.

Signed-off-by: Cindy Zhang <[email protected]>
@edoakes edoakes merged commit 9835610 into ray-project:master Apr 23, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[serve] InMemoryMetricsStore leaks memory with handle-side autoscaling metrics enabled
2 participants