-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Placement Group] Fix placement group removal leak #19138
Conversation
@@ -456,8 +456,7 @@ void ClusterResourceScheduler::AddLocalResourceInstances( | |||
|
|||
for (size_t i = 0; i < instances.size(); i++) { | |||
node_instances->available[i] += instances[i]; | |||
node_instances->total[i] = | |||
std::max(node_instances->total[i], node_instances->available[i]); | |||
node_instances->total[i] += instances[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable, though I don't get what the previous code was trying to do. @wuisawesome any comment? It seems to blame to f52c855
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is already merged, but check #19138 (comment). (seems like he doesn't remember)
@@ -268,7 +268,6 @@ void ClusterTaskManager::DispatchScheduledTasksToWorkers( | |||
WorkerPoolInterface &worker_pool, | |||
std::unordered_map<WorkerID, std::shared_ptr<WorkerInterface>> &leased_workers) { | |||
if (num_tasks_waiting_for_dispatch_ == 0) { | |||
RAY_LOG(DEBUG) << "No new tasks since last call to dispatch, skipping"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed logs in this file are too noisy
@@ -138,7 +138,7 @@ class TaskResourceInstances { | |||
/// Check whether there are no resource instances. | |||
bool IsEmpty() const; | |||
/// Returns human-readable string for these resources. | |||
std::string DebugString() const; | |||
[[nodiscard]] std::string DebugString(const StringIdMap &string_id_map) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is nodiscard? Can we remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it because the lint told me to do this (https://buildkite.com/ray-project/ray-builders-pr/builds/15386#e43a5b55-796a-485c-b5a6-7f41d2ce57e0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @mwtian is this necessary or best practices? Should we keep including this in the linter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be useful in cases where a status code is returned, but it seems too pedantic here. Added you to reviewers of #19175 which disables this check. I can also disable this in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we want to call DebugString() and ignore the result?
It sounds like a bug.
(I like [[nodiscard]] here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I see. If that's what it means it makes sense to have it. I think it is up to what's the convention we'd like to use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sasha-s I agree [[nodiscard]]
can be useful in many cases, but for a function like DebugString()
the benefit seems to be small since ignoring its result is probably not a serious error. The cost here is making the code look too different from "normal" C++. In an alternative world, I think [[nodiscard]]
, noexcept
and const
should be the default requiring no annotations, and only the opposite behavior needs annotations. Not requiring these annotations and only tagging them strategically seem more realistic right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue that if you have DebugString() in your code and do not check the output, then, most likely, you have some post-debugging artifacts.
Would you want those to be checked in?
Also, to me, "normal" c++ code has [[nodiscard]] almost everywhere, both on functions and on types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a stylistic decision. Given that we currently don't have [[nodiscard]], we should probably defer to the time-tested rule of being consistent with current style decisions in the codebase, which is to not have [[nodiscard]].
Otherwise, we should add it everywhere (definitely should be a separate PR, that would be pretty huge).
Why are these changes needed?
This is reproducible when
The issue was that when the new bundle is created,
AddLocalInstances
function doesn't properly update the total resources which causes the removal to fail (since we don't have enough total resources, freeing resources fails). This causes the resource leakage.Detailed scenario
Related issue number
Closes #19131
Checks
scripts/format.sh
to lint the changes in this PR.