Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SchedulingClassInfo.running_tasks memory leak #21535

Merged
merged 4 commits into from
Jan 12, 2022

Conversation

jjyao
Copy link
Collaborator

@jjyao jjyao commented Jan 11, 2022

Why are these changes needed?

In some cases, the task that's added to the running_tasks is never removed and introduces wait time for all the following tasks due to worker cap. One such case is lease request cancellation: the request is cancelled after PopWorker is called and the task is never removed from running_tasks.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jan 12, 2022
@jjyao jjyao removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jan 12, 2022
@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jan 12, 2022
@jjyao jjyao requested a review from ericl January 12, 2022 03:51
@jjyao jjyao removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jan 12, 2022
@@ -280,7 +291,6 @@ class ClusterTaskManagerTest : public ::testing::Test {
ASSERT_TRUE(task_manager_.pinned_task_arguments_.empty());
ASSERT_TRUE(task_manager_.info_by_sched_cls_.empty());
ASSERT_EQ(task_manager_.pinned_task_arguments_bytes_, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm why remove this?

Copy link
Collaborator Author

@jjyao jjyao Jan 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a duplicate line. See line 281.

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jan 12, 2022
@jjyao jjyao added tests-ok The tagger certifies test failures are unrelated and assumes personal liability. and removed @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. labels Jan 12, 2022
@ericl ericl merged commit 2503515 into ray-project:master Jan 12, 2022
@jjyao jjyao deleted the jjyao/cap branch January 12, 2022 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants