Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log allocation failures #988

Merged
merged 4 commits into from
Mar 15, 2022
Merged

Conversation

rongou
Copy link
Contributor

@rongou rongou commented Mar 2, 2022

Right now when we run out of memory, the last allocation that causes the OOM is not being logged. Adding a new allocate failure line to the log to help with debugging. For now these lines are ignored by the replay utility.

@abellina

@rongou rongou added 3 - Ready for review Ready for review by team non-breaking Non-breaking change improvement Improvement / enhancement to an existing function cpp Pertains to C++ code labels Mar 2, 2022
@rongou rongou requested a review from a team as a code owner March 2, 2022 19:10
@rongou rongou self-assigned this Mar 2, 2022
@rongou rongou requested a review from harrism March 2, 2022 19:10
@rongou rongou requested a review from jrhemstad March 2, 2022 19:10
@@ -164,7 +164,8 @@ inline std::vector<event> parse_csv(std::string const& filename)

for (std::size_t i = 0; i < actions.size(); ++i) {
auto const& action = actions[i];
RMM_EXPECTS((action == "allocate") or (action == "free"), "Invalid action string.");
RMM_EXPECTS((action == "allocate") or (action == "allocate failure") or (action == "free"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break the parsing as it assumes an event can only be an action::ALLOCATE or action::FREE. But adding an allocate failure effectively adds a third kind of event.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

@jrhemstad jrhemstad Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a third event isn't quite what I meant. This parser is used for the replay benchmark as well and this will still break the replay benchmark.

Nevermind, I see the replay was already updated.

@rongou rongou requested a review from jrhemstad March 10, 2022 19:06
logger_->info("allocate,{},{},{}", ptr, bytes, fmt::ptr(stream.value()));
return ptr;
} catch (...) {
logger_->info("allocate failure,{},{},{}", nullptr, bytes, fmt::ptr(stream.value()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation needs to be updated that it will log a different message for failed allocations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@rongou rongou requested a review from jrhemstad March 14, 2022 16:15
@harrism
Copy link
Member

harrism commented Mar 15, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 6856d86 into rapidsai:branch-22.04 Mar 15, 2022
@rongou rongou deleted the log-alloc-failure branch June 10, 2024 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for review Ready for review by team cpp Pertains to C++ code improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants