Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cherry-pick][core] Sending ReportWorkerFailure after the process died. (#35320) #35420

Closed

Commits on May 17, 2023

  1. [core] Sending ReportWorkerFailure after the process died. (ray-proje…

    …ct#35320)
    
    ## Why are these changes needed?
    This fix is not fixing from the root ray-project#35247
    
    And in many_nodes_actor_tests_v2, the file descriptor error still shows. 
    
    This fix tries to monitor the process's liveness in some way.  It also introduce a new util function which will retry the failed function until certain number.
    
    Some tests are disabled due to the race condition in detecting node failures which will be fixed later.
    fishbone committed May 17, 2023
    Configuration menu
    Copy the full SHA
    1f757d9 View commit details
    Browse the repository at this point in the history