Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serve] remove target_num_ongoing_requests_per_replica #46392

Merged
merged 1 commit into from
Jul 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions doc/source/serve/advanced-guides/advanced-autoscaling.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,6 @@ In this section, we go into more detail about Serve autoscaling concepts as well

To define what the steady state of your deployments should be, set values for `target_ongoing_requests` and `max_ongoing_requests`.

#### **target_num_ongoing_requests_per_replica [default=2]**
This parameter is renamed to `target_ongoing_requests`. `target_num_ongoing_requests_per_replica` will be removed in a future release.

#### **target_ongoing_requests [default=2]**
:::{note}
The default for `target_ongoing_requests` changed from 1.0 to 2.0 in Ray 2.32.0. You can continue to set it manually to override the default.
Expand Down
2 changes: 1 addition & 1 deletion doc/source/serve/autoscaling-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ You can set `num_replicas="auto"` and override its default values (shown above)

Let's dive into what each of these parameters do.

* **target_ongoing_requests** (replaces the deprecated `target_num_ongoing_requests_per_replica`) is the average number of ongoing requests per replica that the Serve autoscaler tries to ensure. You can adjust it based on your request processing length (the longer the requests, the smaller this number should be) as well as your latency objective (the shorter you want your latency to be, the smaller this number should be).
* **target_ongoing_requests** is the average number of ongoing requests per replica that the Serve autoscaler tries to ensure. You can adjust it based on your request processing length (the longer the requests, the smaller this number should be) as well as your latency objective (the shorter you want your latency to be, the smaller this number should be).
* **max_ongoing_requests** is the maximum number of ongoing requests allowed for a replica. Note this parameter is not part of the autoscaling config because it's relevant to all deployments, but it's important to set it relative to the target value if you turn on autoscaling for your deployment.
* **min_replicas** is the minimum number of replicas for the deployment. Set this to 0 if there are long periods of no traffic and some extra tail latency during upscale is acceptable. Otherwise, set this to what you think you need for low traffic.
* **max_replicas** is the maximum number of replicas for the deployment. Set this to ~20% higher than what you think you need for peak traffic.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ public class AutoscalingConfig implements Serializable {
private static final long serialVersionUID = 9135422781025005216L;
private int minReplicas = 1;
private int maxReplicas = 1;
private int targetNumOngoingRequestsPerReplica = 1;
private int targetOngoingRequests = 1;
/** How often to scrape for metrics */
private double metricsIntervalS = 10.0;
Expand Down Expand Up @@ -35,14 +34,6 @@ public void setMaxReplicas(int maxReplicas) {
this.maxReplicas = maxReplicas;
}

public int getTargetNumOngoingRequestsPerReplica() {
return targetNumOngoingRequestsPerReplica;
}

public void setTargetNumOngoingRequestsPerReplica(int targetNumOngoingRequestsPerReplica) {
this.targetNumOngoingRequestsPerReplica = targetNumOngoingRequestsPerReplica;
}

public int getTargetOngoingRequests() {
return targetOngoingRequests;
}
Expand Down Expand Up @@ -95,7 +86,6 @@ public io.ray.serve.generated.AutoscalingConfig toProto() {
return io.ray.serve.generated.AutoscalingConfig.newBuilder()
.setMinReplicas(minReplicas)
.setMaxReplicas(maxReplicas)
.setTargetNumOngoingRequestsPerReplica(targetNumOngoingRequestsPerReplica)
.setTargetOngoingRequests(targetOngoingRequests)
.setMetricsIntervalS(metricsIntervalS)
.setLookBackPeriodS(lookBackPeriodS)
Expand Down
5 changes: 1 addition & 4 deletions python/ray/dashboard/modules/serve/serve_rest_api_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -238,10 +238,7 @@ def log_config_change_default_warning(self, config):
else:
continue

if (
"target_num_ongoing_requests_per_replica" not in autoscaling_config
and "target_ongoing_requests" not in autoscaling_config
):
if "target_ongoing_requests" not in autoscaling_config:
logger.warning(
"The default value for `target_ongoing_requests` has changed "
"from 1.0 to 2.0 in Ray 2.32.0."
Expand Down
19 changes: 0 additions & 19 deletions python/ray/serve/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -329,28 +329,9 @@ class MyDeployment:
if autoscaling_config not in [DEFAULT.VALUE, None]:
if (
isinstance(autoscaling_config, dict)
and "target_num_ongoing_requests_per_replica" in autoscaling_config
) or (
isinstance(autoscaling_config, AutoscalingConfig)
and "target_num_ongoing_requests_per_replica"
in autoscaling_config.dict(exclude_unset=True)
):
logger.warning(
"DeprecationWarning: `target_num_ongoing_requests_per_replica` in "
"`autoscaling_config` has been deprecated and replaced by "
"`target_ongoing_requests`. "
"`target_num_ongoing_requests_per_replica` will be removed in a future "
"version."
)

if (
isinstance(autoscaling_config, dict)
and "target_num_ongoing_requests_per_replica" not in autoscaling_config
and "target_ongoing_requests" not in autoscaling_config
) or (
isinstance(autoscaling_config, AutoscalingConfig)
and "target_num_ongoing_requests_per_replica"
not in autoscaling_config.dict(exclude_unset=True)
and "target_ongoing_requests"
not in autoscaling_config.dict(exclude_unset=True)
):
Expand Down
13 changes: 2 additions & 11 deletions python/ray/serve/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,13 +41,7 @@ class AutoscalingConfig(BaseModel):
initial_replicas: Optional[NonNegativeInt] = None
max_replicas: PositiveInt = 1

# DEPRECATED: replaced by target_ongoing_requests
target_num_ongoing_requests_per_replica: PositiveFloat = Field(
default=DEFAULT_TARGET_ONGOING_REQUESTS,
description="[DEPRECATED] Please use `target_ongoing_requests` instead.",
)
# Will default to 1.0 in the future.
target_ongoing_requests: Optional[PositiveFloat] = None
target_ongoing_requests: PositiveFloat = DEFAULT_TARGET_ONGOING_REQUESTS

# How often to scrape for metrics
metrics_interval_s: PositiveFloat = 10.0
Expand Down Expand Up @@ -135,7 +129,6 @@ def serialize_policy(self) -> None:
@classmethod
def default(cls):
return cls(
target_num_ongoing_requests_per_replica=DEFAULT_TARGET_ONGOING_REQUESTS,
target_ongoing_requests=DEFAULT_TARGET_ONGOING_REQUESTS,
min_replicas=1,
max_replicas=100,
Expand All @@ -158,9 +151,7 @@ def get_downscaling_factor(self) -> PositiveFloat:
return self.downscale_smoothing_factor or self.smoothing_factor

def get_target_ongoing_requests(self) -> PositiveFloat:
return (
self.target_ongoing_requests or self.target_num_ongoing_requests_per_replica
)
return self.target_ongoing_requests


# Keep in sync with ServeDeploymentMode in dashboard/client/src/type/serve.ts
Expand Down
12 changes: 0 additions & 12 deletions python/ray/serve/deployment.py
Original file line number Diff line number Diff line change
Expand Up @@ -442,18 +442,6 @@ def options(

if autoscaling_config is not DEFAULT.VALUE:
new_deployment_config.autoscaling_config = autoscaling_config
if (
new_deployment_config.autoscaling_config
and "target_num_ongoing_requests_per_replica"
in new_deployment_config.autoscaling_config.dict(exclude_unset=True)
):
logger.warning(
"DeprecationWarning: `target_num_ongoing_requests_per_replica` in "
"`autoscaling_config` has been deprecated and replaced by "
"`target_ongoing_requests`. Note that "
"`target_num_ongoing_requests_per_replica` will be removed in a "
"future version."
)

if graceful_shutdown_wait_loop_s is not DEFAULT.VALUE:
new_deployment_config.graceful_shutdown_wait_loop_s = (
Expand Down
43 changes: 10 additions & 33 deletions python/ray/serve/tests/test_autoscaling_policy.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,45 +122,22 @@ def check_num_requests_ge(client, id: DeploymentID, expected: int):


class TestAutoscalingMetrics:
@pytest.mark.parametrize(
"use_target_ongoing_requests,use_target_num_ongoing_requests_per_replica",
[(True, True), (True, False), (False, True)],
)
def test_basic(
self,
serve_instance,
use_target_num_ongoing_requests_per_replica,
use_target_ongoing_requests,
):
def test_basic(self, serve_instance):
"""Test that request metrics are sent correctly to the controller."""

client = serve_instance
signal = SignalActor.remote()

autoscaling_config = {
"metrics_interval_s": 0.1,
"min_replicas": 1,
"max_replicas": 10,
"upscale_delay_s": 0,
"downscale_delay_s": 0,
"look_back_period_s": 1,
}
if (
use_target_ongoing_requests
and not use_target_num_ongoing_requests_per_replica
):
autoscaling_config["target_ongoing_requests"] = 10
elif (
use_target_ongoing_requests and use_target_num_ongoing_requests_per_replica
):
autoscaling_config["target_ongoing_requests"] = 10
# Random setting, should get ignored
autoscaling_config["target_num_ongoing_requests_per_replica"] = 234
else:
autoscaling_config["target_num_ongoing_requests_per_replica"] = 10

@serve.deployment(
autoscaling_config=autoscaling_config,
autoscaling_config={
"metrics_interval_s": 0.1,
"min_replicas": 1,
"max_replicas": 10,
"target_ongoing_requests": 10,
"upscale_delay_s": 0,
"downscale_delay_s": 0,
"look_back_period_s": 1,
},
# We will send many requests. This will make sure replicas are
# killed quickly during cleanup.
graceful_shutdown_timeout_s=1,
Expand Down
3 changes: 1 addition & 2 deletions python/ray/serve/tests/test_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,8 +163,7 @@ def autoscaling_app():
"min_replicas": 1,
"initial_replicas": None,
"max_replicas": 10,
"target_num_ongoing_requests_per_replica": 2.0,
"target_ongoing_requests": None,
"target_ongoing_requests": 2.0,
"metrics_interval_s": 10.0,
"look_back_period_s": 30.0,
"smoothing_factor": 1.0,
Expand Down
2 changes: 0 additions & 2 deletions python/ray/serve/tests/test_deploy_2.py
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,6 @@ async def __call__(self):
assert deployment_config["autoscaling_config"] == {
# Set by `num_replicas="auto"`
"target_ongoing_requests": 2.0,
"target_num_ongoing_requests_per_replica": 2.0,
"min_replicas": 1,
"max_replicas": 100,
# Untouched defaults
Expand Down Expand Up @@ -373,7 +372,6 @@ async def __call__(self):
assert deployment_config["autoscaling_config"] == {
# Set by `num_replicas="auto"`
"target_ongoing_requests": 2.0,
"target_num_ongoing_requests_per_replica": 2.0,
"min_replicas": 1,
"max_replicas": 100,
# Overrided by `autoscaling_config`
Expand Down
2 changes: 0 additions & 2 deletions python/ray/serve/tests/test_deploy_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -1432,7 +1432,6 @@ def test_num_replicas_auto_api(client: ServeControllerClient):
assert deployment_config["autoscaling_config"] == {
# Set by `num_replicas="auto"`
"target_ongoing_requests": 2.0,
"target_num_ongoing_requests_per_replica": 2.0,
"min_replicas": 1,
"max_replicas": 100,
# Untouched defaults
Expand Down Expand Up @@ -1484,7 +1483,6 @@ def test_num_replicas_auto_basic(client: ServeControllerClient):
assert deployment_config["autoscaling_config"] == {
# Set by `num_replicas="auto"`
"target_ongoing_requests": 2.0,
"target_num_ongoing_requests_per_replica": 2.0,
"min_replicas": 1,
"max_replicas": 100,
# Overrided by `autoscaling_config`
Expand Down
Loading