-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data] Warn on excessive driver memory usage during shuffle ops #42574
[data] Warn on excessive driver memory usage during shuffle ops #42574
Conversation
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @stephanie-wang. Change on data directory looks good to me, except one comment.
# If driver memory exceeds this threshold, warn the user. For now, this | ||
# only applies to shuffle ops because most other ops are unlikely to use as | ||
# much driver memory. | ||
warn_on_driver_memory_usage_bytes: Optional[int] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we pass this argument through DataContext
to ExchangeTaskScheduler.execute(warn_on_driver_memory_usage_bytes=...)
directly?
TaskContext
ideally should only contain the unique information per each task. Feeling it does not need to go through TaskContext
in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, moved to ExchangeTaskScheduler.
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @stephanie-wang! The code change on Data part LGTM.
Why are these changes needed?
Warns the user if too much driver memory is used for:
Related issue number
Closes #40861.