Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GCS]Fix gcs placement group manager worker dead ut #11103

Conversation

clay4megtr
Copy link
Contributor

@clay4megtr clay4megtr commented Sep 29, 2020

Why are these changes needed?

update the ut code to avoid ci failed.

origin ut process

  1. Register first placement group request.
  2. Start schedule and mark there is a placement group scheduling.
  3. Register second placement group request.
  4. The second pg request would not schedule now, because there is a pg scheduling.
  5. Manully assign the node field of the first pg request bundle, simulation scheduling is done,
  6. Invoke placement group manager's OnNodeDead() method, simulation a node dead.
  7. Placement group scheduler get the bundle of the dead node, simulation the first bundle of first pg is on the dead node here.
  8. Placement group manager find the failed placement group (the first), add it to pending_placement_groups_ queue(head).
  9. Rescheduling
  10. Get the pg from queue head, start rescheduling, ( NOTICE: the first pg is on scheduling state, so it will return directly )
  11. Manully invoke OnPlacementGroupCreationSuccess() method, simulation the first pg is schedule done (Mark the scheduling state to end state here ).
  12. OnPlacementGroupCreationSuccess() method will lead to schedule next pg, but the next pg is also first pg, because in step 10, the first pg is not scheduled, equivalent to reschedule first pg.....

updated ut process

...
5. After step 5, we need to mark no placement group is scheduling, so that first placement group will schedule in step 10, and will remove it from pending placement group queue.
...
12. In step 12, when OnPlacementGroupCreationSuccess() method invoke the reschedule method, it will schedule the second placement group request.

supplement

Except above question, this ut is also occasionally fail in our internal project, but I haven't reproduced it locally....

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@rkooo567
Copy link
Contributor

Please update the description to explain

  1. the high level problem.
  2. Proposed solution.

@rkooo567 rkooo567 self-assigned this Sep 29, 2020
@rkooo567 rkooo567 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Sep 30, 2020
@clay4megtr clay4megtr changed the title [gcs]Fix gcs placement group manager worker dead ut [GCS]Fix gcs placement group manager worker dead ut Sep 30, 2020
@clay4megtr
Copy link
Contributor Author

Please update the description to explain

  1. the high level problem.
  2. Proposed solution.

hi, SangBin, I have update the comment, maybe you check it?

@rkooo567 rkooo567 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Oct 1, 2020
@rkooo567 rkooo567 removed their assignment Nov 23, 2021
@bveeramani
Copy link
Member

‼️ ACTION REQUIRED ‼️

We've switched our code formatter from YAPF to Black (see #21311).

To prevent issues with merging your code, here's what you'll need to do:

  1. Install Black
pip install -I black==21.12b0
  1. Format changed files with Black
curl -o format-changed.sh https://gist.githubusercontent.com/bveeramani/42ef0e9e387b755a8a735b084af976f2/raw/7631276790765d555c423b8db2b679fd957b984a/format-changed.sh
chmod +x ./format-changed.sh
./format-changed.sh
rm format-changed.sh
  1. Commit your changes.
git add --all
git commit -m "Format Python code with Black"
  1. Merge master into your branch.
git pull upstream master
  1. Resolve merge conflicts (if necessary).

After running these steps, you'll have the updated format.sh.

@clay4megtr clay4megtr closed this Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants