Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core][doc] ray.get with timeout=0 will warn behavior change in future ray releases. #31140

Merged
merged 9 commits into from
Dec 20, 2022

Conversation

rickyyx
Copy link
Contributor

@rickyyx rickyyx commented Dec 15, 2022

Why are these changes needed?

With ray. get(timeout=0) currently blocks until objects are ready, which is not the ideal behavior. Since there might be existing users depending on such behavior. We will fix this with a more gradual migration in future releases as we gather current usage.

Besides this PR, we will:

  1. Reach out on the slack channel about this behavior change
  2. Add telemetry for usage.
  3. Fix this in future ray release

Related issue number

#28465

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: rickyyx <[email protected]>
Signed-off-by: rickyyx <[email protected]>
Copy link
Contributor

@pcmoritz pcmoritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment, but looks good to me otherwise :)

@rickyyx rickyyx added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Dec 15, 2022
python/ray/_private/worker.py Outdated Show resolved Hide resolved
Signed-off-by: rickyyx <[email protected]>
@rickyyx rickyyx removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Dec 16, 2022
Signed-off-by: rickyyx <[email protected]>
.
Signed-off-by: rickyyx <[email protected]>
@rickyyx
Copy link
Contributor Author

rickyyx commented Dec 16, 2022

@pcmoritz @rkooo567 PTAL when u get the chance :D

Looks like there are test failures related.

@rickyyx rickyyx added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Dec 16, 2022
Signed-off-by: rickyyx <[email protected]>
@rickyyx
Copy link
Contributor Author

rickyyx commented Dec 16, 2022

Moved the test to test_advanced since test_basic_xx will also be used for ray client tests, which have a different implementation of ray.get that actually raises the timeout. See this previous failure

Signed-off-by: rickyyx <[email protected]>
@rickyyx rickyyx added @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. and removed @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. labels Dec 16, 2022
Signed-off-by: rickyyx <[email protected]>
@rickyyx rickyyx removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Dec 16, 2022
@rkooo567
Copy link
Contributor

which have a different implementation of ray.get that actually raises the timeout. See this previous failure

In this case, is the behavior the same? We should make sure both prints the same warning for the same argument.

@rickyyx
Copy link
Contributor Author

rickyyx commented Dec 19, 2022

which have a different implementation of ray.get that actually raises the timeout. See this previous failure

In this case, is the behavior the same? We should make sure both prints the same warning for the same argument.

No - ray client's get has a different behavior: i.e. it doens't block with timeout=0. So I am not adding the warnings.

@rkooo567
Copy link
Contributor

what's the current behavior? Since it is not possible to do ray.get(timeout=0), it means ray client cannot do that as well?

@rickyyx
Copy link
Contributor Author

rickyyx commented Dec 20, 2022

what's the current behavior? Since it is not possible to do ray.get(timeout=0), it means ray client cannot do that as well?

For ray client, ray.get(timeout=0 will raise GetTimeoutError if the object's not ready, which is the correct behaviour we are changing to for non-client behaviour.

@rkooo567
Copy link
Contributor

@rickyyx gotcha. that's interesting... do you know how it is implemented? (they did custom implementation to make it work?)

@rkooo567 rkooo567 merged commit b963198 into ray-project:master Dec 20, 2022
@rickyyx
Copy link
Contributor Author

rickyyx commented Dec 20, 2022

@rickyyx gotcha. that's interesting... do you know how it is implemented? (they did custom implementation to make it work?)

Yep, here:

if timeout is None:
deadline = None
else:
deadline = time.monotonic() + timeout
max_blocking_operation_time = MAX_BLOCKING_OPERATION_TIME_S
if "RAY_CLIENT_MAX_BLOCKING_OPERATION_TIME_S" in os.environ:
max_blocking_operation_time = float(
os.environ["RAY_CLIENT_MAX_BLOCKING_OPERATION_TIME_S"]
)
while True:
if deadline:
op_timeout = min(
max_blocking_operation_time,
max(deadline - time.monotonic(), 0.001),
)
else:
op_timeout = max_blocking_operation_time
try:
res = self._get(to_get, op_timeout)
break
except GetTimeoutError:
if deadline and time.monotonic() > deadline:
raise
logger.debug("Internal retry for get {}".format(to_get))
if len(to_get) != len(res):
raise Exception(
"Mismatched number of items in request ({}) and response ({})".format(
len(to_get), len(res)
)
)

AmeerHajAli pushed a commit that referenced this pull request Jan 12, 2023
…e ray releases. (#31140)

With ray. get(timeout=0) currently blocks until objects are ready, which is not the ideal behavior. Since there might be existing users depending on such behavior. We will fix this with a more gradual migration in future releases as we gather current usage.

Besides this PR, we will:

Reach out on the slack channel about this behavior change
Add telemetry for usage.
Fix this in future ray release
tamohannes pushed a commit to ju2ez/ray that referenced this pull request Jan 25, 2023
…e ray releases. (ray-project#31140)

With ray. get(timeout=0) currently blocks until objects are ready, which is not the ideal behavior. Since there might be existing users depending on such behavior. We will fix this with a more gradual migration in future releases as we gather current usage.

Besides this PR, we will:

Reach out on the slack channel about this behavior change
Add telemetry for usage.
Fix this in future ray release

Signed-off-by: tmynn <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants