Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Test] gcs_placement_group_manager_test flaky in the master #30573

Closed
rkooo567 opened this issue Nov 22, 2022 · 12 comments · Fixed by #30787
Closed

[Test] gcs_placement_group_manager_test flaky in the master #30573

rkooo567 opened this issue Nov 22, 2022 · 12 comments · Fixed by #30787
Assignees
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core P1 Issue that should be fixed within a few weeks

Comments

@rkooo567
Copy link
Contributor

What happened + What you expected to happen

Screen Shot 2022-11-21 at 9 14 14 PM

After merging this PR, test_placement_group became flaky. We should investigate and fix it.

cc @xwjiang2010 I will mark it as a release blocker until we find the root cause.

cc @larrylian

Versions / Dependencies

master

Reproduction script

n/a

Issue Severity

No response

@rkooo567 rkooo567 added bug Something that is supposed to be working; but isn't release-blocker P0 Issue that blocks the release P0 Issues that should be fixed in short order core Issues that should be addressed in Ray Core labels Nov 22, 2022
@larrylian
Copy link
Contributor

larrylian commented Nov 22, 2022

I will look at this issue. It should be only a test issue introduced by #24875 PR. Now it may be related to multi-threaded execution.

@larrylian
Copy link
Contributor

larrylian commented Nov 22, 2022

#30577 @rkooo567 I have fix this case, I will see if this use case CI can run successfully tomorrow.

@cadedaniel
Copy link
Member

cadedaniel commented Nov 22, 2022

I don't understand if #30577 fixes the root cause in TestRemovingPendingPlacementGroup or not but FYI TestRemovingLeasingPlacementGroup recently failed due to threading issues, if they're related then we should fix the root problem (multithreaded usage on a non-thread-safe class). I'm happy to help with that if it's a release blocker.

The hacky fix for TestRemovingLeasingPlacementGroup was to sleep to avoid a race #29694

@rkooo567 rkooo567 removed their assignment Nov 22, 2022
@scv119 scv119 assigned scv119 and rkooo567 and unassigned scv119 Nov 23, 2022
@rkooo567
Copy link
Contributor Author

We merged the PR. Let's see if it becomes unflaky after that

@rkooo567
Copy link
Contributor Author

I will also add more tests as a follow up. So we shouldn't close this (but just downgrade the priority)

@xwjiang2010
Copy link
Contributor

xwjiang2010 commented Nov 29, 2022

@rkooo567
Copy link
Contributor Author

Hmm I guess this is not fixed by that PR... Let me take a look tomorrow.

@xwjiang2010
Copy link
Contributor

Just confirming is this still release blocking or we can drop it?

@rkooo567
Copy link
Contributor Author

Can you give me one more day for this? I will sort that out by today.

@rkooo567
Copy link
Contributor Author

(is this the last blocker?)

@larrylian
Copy link
Contributor

Hmm I guess this is not fixed by that PR... Let me take a look tomorrow.

I looked at the few flakey's, maybe the branch code is relatively old and didn't bring my fix PR.

@rkooo567
Copy link
Contributor Author

@xwjiang2010 this is also a test issue. @cadedaniel it is the same issue as you mentioned. I found if I add until io service is empty, it passes 300 times in a row without an issue. I will fix this separately, but it doesn't need to be a release blocker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants