Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Placement Group] Support RayPlacementGroupError #10508 #13140

Closed
wants to merge 15 commits into from

Conversation

oliverhu
Copy link
Member

Why are these changes needed?

Fix #10508

Also:

  • Refactored ProcessDisconnectClientMessage to an msg processing function and an util function. A lot of message passing is from the same file.
  • Removed intentional_disconnect_client, replaced with a ClientDisconnectType to provide more information on disconnect reasoning.
  • Renamed DestroyWorker to DisconnectAndKillWorker to bring more clarity.

Related issue number

#10508

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@oliverhu
Copy link
Member Author

oliverhu commented Dec 31, 2020

Reverted changes in https://github.com/ray-project/ray/pull/12821/files which deleted intention_disconnect message type, and changed message_handler signature, as those changes are fairly risky and not related to this specific issue. Will open another discussion thread on that.

@oliverhu
Copy link
Member Author

oliverhu commented Jan 1, 2021

Test session starts (platform: darwin, Python 3.8.3, pytest 5.4.3, pytest-sugar 0.9.4)
rootdir: /Users/khu/ray/ray/python
plugins: sugar-0.9.4, rerunfailures-9.1, asyncio-0.14.0, timeout-1.4.2
collecting ...
 ray/tests/test_failure.py ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓sss✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓ 84% ████████▌
                           ✓✓✓✓✓                                                                          ✓✓✓✓✓✓                                                        ✓✓✓✓✓✓✓                                                                                            100% ██████████
============================================================= warnings summary ==============================================================
ray/tests/test_failure.py::test_failed_task
  /Users/khu/opt/anaconda3/lib/python3.8/site-packages/aiohttp/helpers.py:107: DeprecationWarning: "@coroutine" decorator is deprecated since Python 3.8, use "async def" instead
    def noop(*args, **kwargs):  # type: ignore

ray/tests/test_failure.py: 23 tests with warnings
  /Users/khu/ray/ray/python/ray/exceptions.py:87: DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
    self.proctitle = setproctitle.getproctitle()

-- Docs: https://docs.pytest.org/en/latest/warnings.html

Results (247.39s):
      42 passed
       3 skipped

src/ray/protobuf/common.proto Outdated Show resolved Hide resolved
src/ray/protobuf/common.proto Outdated Show resolved Hide resolved
src/ray/raylet/node_manager.h Outdated Show resolved Hide resolved
Copy link
Contributor

@ffbin ffbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a small comment.

@rkooo567 rkooo567 self-assigned this Jan 5, 2021
Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests to test_placement_group?

@oliverhu oliverhu added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jan 7, 2021
@oliverhu
Copy link
Member Author

@rkooo567 and I discussed further on this PR, current changes are not enough to cover all cases (direct task and actor task). and we might have to subscribe to worker changes, and make changes to ActorTableData to track actor exit status. We will park this PR for a bit of time..

@bveeramani
Copy link
Member

‼️ ACTION REQUIRED ‼️

We've switched our code formatter from YAPF to Black (see #21311).

To prevent issues with merging your code, here's what you'll need to do:

  1. Install Black
pip install -I black==21.12b0
  1. Format changed files with Black
curl -o format-changed.sh https://gist.githubusercontent.com/bveeramani/42ef0e9e387b755a8a735b084af976f2/raw/7631276790765d555c423b8db2b679fd957b984a/format-changed.sh
chmod +x ./format-changed.sh
./format-changed.sh
rm format-changed.sh
  1. Commit your changes.
git add --all
git commit -m "Format Python code with Black"
  1. Merge master into your branch.
git pull upstream master
  1. Resolve merge conflicts (if necessary).

After running these steps, you'll have the updated format.sh.

@oliverhu oliverhu closed this Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Placement Group] Support RayPlacementGroupError
4 participants