Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Fix in-container memory limit fetching for cgroups v2 #23922

Merged
merged 2 commits into from
Apr 15, 2022

Conversation

clarkzinzow
Copy link
Contributor

When trying to fetch the in-container memory limit while running Ray in a container using cgroups v2, the limit parsing will fail if the limit is not set, which will prevent Ray from being able to start. This currently happens when trying to run Ray in a Google Colab notebook.

It looks like this memory limit fetching path might be untested? cc @DmitriGekhtman @wuisawesome

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@@ -389,7 +389,12 @@ def get_system_memory():
docker_limit = int(f.read())
elif os.path.exists(memory_limit_filename_v2):
with open(memory_limit_filename_v2, "r") as f:
docker_limit = int(f.read())
max_file = f.read()
if max_file.isnumeric():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh never know this existed.

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Apr 14, 2022
Copy link
Contributor

@DmitriGekhtman DmitriGekhtman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!
Looks like I neglected to test memory in #21688 and only covered CPU.
Would you mind adding a test for the fix in this PR? test_advanced_3 is as good a place as any to put it.

@richardliaw
Copy link
Contributor

LGTM, adding test would be great.

@clarkzinzow
Copy link
Contributor Author

@DmitriGekhtman @richardliaw Tests added.

@richardliaw
Copy link
Contributor

seems like test failures are unrelated, merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants