Skip to content

Commit

Permalink
[ray client] enable ray.get with >2 sec timeout (#21883) (#22165)
Browse files Browse the repository at this point in the history
Commit 2cf4c72 ("[ray client] Fix ctrl-c for ray.get() by setting a
short-server side timeout") introduced a short server-side timeout not
to block later operations.

However, the fix implicitly assumes that get() is complete within
MAX_BLOCKING_OPERATION_TIME_S (two seconds). This becomes a problem
when apps use heavy objects or limited network I/O bandwidth that
require more than two seconds to push all chunks. The current retry
logic needs to re-push from the beginning of chunks and block clients
with the infinite re-push.

I updated the logic to directly pass timeout if it is explicitly given.
Without timeout, it still uses MAX_BLOCKING_OPERATION_TIME_S for
polling with the short server-side timeout.
  • Loading branch information
takeshi-yoshimura authored Apr 25, 2022
1 parent c73f02d commit e115545
Showing 1 changed file with 7 additions and 2 deletions.
9 changes: 7 additions & 2 deletions python/ray/util/client/worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -421,14 +421,19 @@ def get(self, vals, *, timeout: Optional[float] = None) -> Any:
else:
deadline = time.monotonic() + timeout

max_blocking_operation_time = MAX_BLOCKING_OPERATION_TIME_S
if "RAY_CLIENT_MAX_BLOCKING_OPERATION_TIME_S" in os.environ:
max_blocking_operation_time = float(
os.environ["RAY_CLIENT_MAX_BLOCKING_OPERATION_TIME_S"]
)
while True:
if deadline:
op_timeout = min(
MAX_BLOCKING_OPERATION_TIME_S,
max_blocking_operation_time,
max(deadline - time.monotonic(), 0.001),
)
else:
op_timeout = MAX_BLOCKING_OPERATION_TIME_S
op_timeout = max_blocking_operation_time
try:
res = self._get(to_get, op_timeout)
break
Expand Down

0 comments on commit e115545

Please sign in to comment.