Skip to content

Commit

Permalink
[serve] remove target_num_ongoing_requests_per_replica
Browse files Browse the repository at this point in the history
`target_num_ongoing_requests_per_replica` was deprecated in ray 2.10, and `target_ongoing_requests` was introduced at the same time. There have been many releases since then, so we can remove `target_num_ongoing_requests_per_replica` now.


Signed-off-by: Cindy Zhang <[email protected]>
  • Loading branch information
zcin committed Jul 2, 2024
1 parent db2288c commit fffa335
Show file tree
Hide file tree
Showing 16 changed files with 58 additions and 273 deletions.
5 changes: 1 addition & 4 deletions dashboard/modules/serve/serve_rest_api_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -240,10 +240,7 @@ def log_config_change_default_warning(self, config):
else:
continue

if (
"target_num_ongoing_requests_per_replica" not in autoscaling_config
and "target_ongoing_requests" not in autoscaling_config
):
if "target_ongoing_requests" not in autoscaling_config:
logger.warning(
"The default value for `target_ongoing_requests` has changed "
"from 1.0 to 2.0 in Ray 2.32.0."
Expand Down
3 changes: 0 additions & 3 deletions doc/source/serve/advanced-guides/advanced-autoscaling.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,6 @@ In this section, we go into more detail about Serve autoscaling concepts as well

To define what the steady state of your deployments should be, set values for `target_ongoing_requests` and `max_ongoing_requests`.

#### **target_num_ongoing_requests_per_replica [default=2]**
This parameter is renamed to `target_ongoing_requests`. `target_num_ongoing_requests_per_replica` will be removed in a future release.

#### **target_ongoing_requests [default=2]**
:::{note}
The default for `target_ongoing_requests` changed from 1.0 to 2.0 in Ray 2.32.0. You can continue to set it manually to override the default.
Expand Down
6 changes: 3 additions & 3 deletions doc/source/serve/autoscaling-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,8 @@ You can set `num_replicas="auto"` and override its default values (shown above)

Let's dive into what each of these parameters do.

* **target_ongoing_requests** (replaces the deprecated `target_num_ongoing_requests_per_replica`) is the average number of ongoing requests per replica that the Serve autoscaler tries to ensure. You can adjust it based on your request processing length (the longer the requests, the smaller this number should be) as well as your latency objective (the shorter you want your latency to be, the smaller this number should be).
* **max_ongoing_requests** (replaces the deprecated `max_concurrent_queries`) is the maximum number of ongoing requests allowed for a replica. Note this parameter is not part of the autoscaling config because it's relevant to all deployments, but it's important to set it relative to the target value if you turn on autoscaling for your deployment.
* **target_ongoing_requests** is the average number of ongoing requests per replica that the Serve autoscaler tries to ensure. You can adjust it based on your request processing length (the longer the requests, the smaller this number should be) as well as your latency objective (the shorter you want your latency to be, the smaller this number should be).
* **max_ongoing_requests** is the maximum number of ongoing requests allowed for a replica. Note this parameter is not part of the autoscaling config because it's relevant to all deployments, but it's important to set it relative to the target value if you turn on autoscaling for your deployment.
* **min_replicas** is the minimum number of replicas for the deployment. Set this to 0 if there are long periods of no traffic and some extra tail latency during upscale is acceptable. Otherwise, set this to what you think you need for low traffic.
* **max_replicas** is the maximum number of replicas for the deployment. Set this to ~20% higher than what you think you need for peak traffic.

Expand Down Expand Up @@ -104,4 +104,4 @@ The Ray Serve Autoscaler is an application-level autoscaler that sits on top of
Concretely, this means that the Ray Serve autoscaler asks Ray to start a number of replica actors based on the request demand.
If the Ray Autoscaler determines there aren't enough available resources (e.g. CPUs, GPUs, etc.) to place these actors, it responds by requesting more Ray nodes.
The underlying cloud provider then responds by adding more nodes.
Similarly, when Ray Serve scales down and terminates replica Actors, it attempts to make as many nodes idle as possible so the Ray Autoscaler can remove them. To learn more about the architecture underlying Ray Serve Autoscaling, see [Ray Serve Autoscaling Architecture](serve-autoscaling-architecture).
Similarly, when Ray Serve scales down and terminates replica Actors, it attempts to make as many nodes idle as possible so the Ray Autoscaler can remove them. To learn more about the architecture underlying Ray Serve Autoscaling, see [Ray Serve Autoscaling Architecture](serve-autoscaling-architecture).
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ public class AutoscalingConfig implements Serializable {
private static final long serialVersionUID = 9135422781025005216L;
private int minReplicas = 1;
private int maxReplicas = 1;
private int targetNumOngoingRequestsPerReplica = 1;
private int targetOngoingRequests = 1;
/** How often to scrape for metrics */
private double metricsIntervalS = 10.0;
Expand Down Expand Up @@ -35,14 +34,6 @@ public void setMaxReplicas(int maxReplicas) {
this.maxReplicas = maxReplicas;
}

public int getTargetNumOngoingRequestsPerReplica() {
return targetNumOngoingRequestsPerReplica;
}

public void setTargetNumOngoingRequestsPerReplica(int targetNumOngoingRequestsPerReplica) {
this.targetNumOngoingRequestsPerReplica = targetNumOngoingRequestsPerReplica;
}

public int getTargetOngoingRequests() {
return targetOngoingRequests;
}
Expand Down Expand Up @@ -95,7 +86,6 @@ public io.ray.serve.generated.AutoscalingConfig toProto() {
return io.ray.serve.generated.AutoscalingConfig.newBuilder()
.setMinReplicas(minReplicas)
.setMaxReplicas(maxReplicas)
.setTargetNumOngoingRequestsPerReplica(targetNumOngoingRequestsPerReplica)
.setTargetOngoingRequests(targetOngoingRequests)
.setMetricsIntervalS(metricsIntervalS)
.setLookBackPeriodS(lookBackPeriodS)
Expand Down
19 changes: 0 additions & 19 deletions python/ray/serve/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -336,28 +336,9 @@ class MyDeployment:
if autoscaling_config not in [DEFAULT.VALUE, None]:
if (
isinstance(autoscaling_config, dict)
and "target_num_ongoing_requests_per_replica" in autoscaling_config
) or (
isinstance(autoscaling_config, AutoscalingConfig)
and "target_num_ongoing_requests_per_replica"
in autoscaling_config.dict(exclude_unset=True)
):
logger.warning(
"DeprecationWarning: `target_num_ongoing_requests_per_replica` in "
"`autoscaling_config` has been deprecated and replaced by "
"`target_ongoing_requests`. "
"`target_num_ongoing_requests_per_replica` will be removed in a future "
"version."
)

if (
isinstance(autoscaling_config, dict)
and "target_num_ongoing_requests_per_replica" not in autoscaling_config
and "target_ongoing_requests" not in autoscaling_config
) or (
isinstance(autoscaling_config, AutoscalingConfig)
and "target_num_ongoing_requests_per_replica"
not in autoscaling_config.dict(exclude_unset=True)
and "target_ongoing_requests"
not in autoscaling_config.dict(exclude_unset=True)
):
Expand Down
13 changes: 2 additions & 11 deletions python/ray/serve/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,13 +41,7 @@ class AutoscalingConfig(BaseModel):
initial_replicas: Optional[NonNegativeInt] = None
max_replicas: PositiveInt = 1

# DEPRECATED: replaced by target_ongoing_requests
target_num_ongoing_requests_per_replica: PositiveFloat = Field(
default=DEFAULT_TARGET_ONGOING_REQUESTS,
description="[DEPRECATED] Please use `target_ongoing_requests` instead.",
)
# Will default to 1.0 in the future.
target_ongoing_requests: Optional[PositiveFloat] = None
target_ongoing_requests: PositiveFloat = DEFAULT_TARGET_ONGOING_REQUESTS

# How often to scrape for metrics
metrics_interval_s: PositiveFloat = 10.0
Expand Down Expand Up @@ -135,7 +129,6 @@ def serialize_policy(self) -> None:
@classmethod
def default(cls):
return cls(
target_num_ongoing_requests_per_replica=DEFAULT_TARGET_ONGOING_REQUESTS,
target_ongoing_requests=DEFAULT_TARGET_ONGOING_REQUESTS,
min_replicas=1,
max_replicas=100,
Expand All @@ -158,9 +151,7 @@ def get_downscaling_factor(self) -> PositiveFloat:
return self.downscale_smoothing_factor or self.smoothing_factor

def get_target_ongoing_requests(self) -> PositiveFloat:
return (
self.target_ongoing_requests or self.target_num_ongoing_requests_per_replica
)
return self.target_ongoing_requests


# Keep in sync with ServeDeploymentMode in dashboard/client/src/type/serve.ts
Expand Down
12 changes: 0 additions & 12 deletions python/ray/serve/deployment.py
Original file line number Diff line number Diff line change
Expand Up @@ -464,18 +464,6 @@ def options(

if autoscaling_config is not DEFAULT.VALUE:
new_deployment_config.autoscaling_config = autoscaling_config
if (
new_deployment_config.autoscaling_config
and "target_num_ongoing_requests_per_replica"
in new_deployment_config.autoscaling_config.dict(exclude_unset=True)
):
logger.warning(
"DeprecationWarning: `target_num_ongoing_requests_per_replica` in "
"`autoscaling_config` has been deprecated and replaced by "
"`target_ongoing_requests`. Note that "
"`target_num_ongoing_requests_per_replica` will be removed in a "
"future version."
)

if graceful_shutdown_wait_loop_s is not DEFAULT.VALUE:
new_deployment_config.graceful_shutdown_wait_loop_s = (
Expand Down
43 changes: 10 additions & 33 deletions python/ray/serve/tests/test_autoscaling_policy.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,45 +122,22 @@ def check_num_requests_ge(client, id: DeploymentID, expected: int):


class TestAutoscalingMetrics:
@pytest.mark.parametrize(
"use_target_ongoing_requests,use_target_num_ongoing_requests_per_replica",
[(True, True), (True, False), (False, True)],
)
def test_basic(
self,
serve_instance,
use_target_num_ongoing_requests_per_replica,
use_target_ongoing_requests,
):
def test_basic(self, serve_instance):
"""Test that request metrics are sent correctly to the controller."""

client = serve_instance
signal = SignalActor.remote()

autoscaling_config = {
"metrics_interval_s": 0.1,
"min_replicas": 1,
"max_replicas": 10,
"upscale_delay_s": 0,
"downscale_delay_s": 0,
"look_back_period_s": 1,
}
if (
use_target_ongoing_requests
and not use_target_num_ongoing_requests_per_replica
):
autoscaling_config["target_ongoing_requests"] = 10
elif (
use_target_ongoing_requests and use_target_num_ongoing_requests_per_replica
):
autoscaling_config["target_ongoing_requests"] = 10
# Random setting, should get ignored
autoscaling_config["target_num_ongoing_requests_per_replica"] = 234
else:
autoscaling_config["target_num_ongoing_requests_per_replica"] = 10

@serve.deployment(
autoscaling_config=autoscaling_config,
autoscaling_config={
"metrics_interval_s": 0.1,
"min_replicas": 1,
"max_replicas": 10,
"target_ongoing_requests": 10,
"upscale_delay_s": 0,
"downscale_delay_s": 0,
"look_back_period_s": 1,
},
# We will send many requests. This will make sure replicas are
# killed quickly during cleanup.
graceful_shutdown_timeout_s=1,
Expand Down
3 changes: 1 addition & 2 deletions python/ray/serve/tests/test_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,8 +164,7 @@ def autoscaling_app():
"min_replicas": 1,
"initial_replicas": None,
"max_replicas": 10,
"target_num_ongoing_requests_per_replica": 2.0,
"target_ongoing_requests": None,
"target_ongoing_requests": 2.0,
"metrics_interval_s": 10.0,
"look_back_period_s": 30.0,
"smoothing_factor": 1.0,
Expand Down
2 changes: 0 additions & 2 deletions python/ray/serve/tests/test_deploy_2.py
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,6 @@ async def __call__(self):
assert deployment_config["autoscaling_config"] == {
# Set by `num_replicas="auto"`
"target_ongoing_requests": 2.0,
"target_num_ongoing_requests_per_replica": 2.0,
"min_replicas": 1,
"max_replicas": 100,
# Untouched defaults
Expand Down Expand Up @@ -373,7 +372,6 @@ async def __call__(self):
assert deployment_config["autoscaling_config"] == {
# Set by `num_replicas="auto"`
"target_ongoing_requests": 2.0,
"target_num_ongoing_requests_per_replica": 2.0,
"min_replicas": 1,
"max_replicas": 100,
# Overrided by `autoscaling_config`
Expand Down
2 changes: 0 additions & 2 deletions python/ray/serve/tests/test_deploy_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -1440,7 +1440,6 @@ def test_num_replicas_auto_api(client: ServeControllerClient):
assert deployment_config["autoscaling_config"] == {
# Set by `num_replicas="auto"`
"target_ongoing_requests": 2.0,
"target_num_ongoing_requests_per_replica": 2.0,
"min_replicas": 1,
"max_replicas": 100,
# Untouched defaults
Expand Down Expand Up @@ -1492,7 +1491,6 @@ def test_num_replicas_auto_basic(client: ServeControllerClient):
assert deployment_config["autoscaling_config"] == {
# Set by `num_replicas="auto"`
"target_ongoing_requests": 2.0,
"target_num_ongoing_requests_per_replica": 2.0,
"min_replicas": 1,
"max_replicas": 100,
# Overrided by `autoscaling_config`
Expand Down
Loading

0 comments on commit fffa335

Please sign in to comment.