-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-40224: [C++] Fix: improve the backpressure handling in the dataset writer #40722
Conversation
|
Thanks @westonpace! I can confirm that when running this code on the dataset using the query reported in the original issue, everything now works perfectly! 🎉 |
@@ -277,6 +278,8 @@ class ARROW_EXPORT ThrottledAsyncTaskScheduler : public AsyncTaskScheduler { | |||
/// Allows task to be submitted again. If there is a max_concurrent_cost limit then | |||
/// it will still apply. | |||
virtual void Resume() = 0; | |||
/// Return the number of tasks queued but not yet submitted | |||
virtual std::size_t QueueSize() = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this better a std::size_t QueueSize() const
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation uses a std::mutex
and so I'd have to mark the mutex mutable right? Which would you prefer? "mutable mutex" or "non-const accessor"? I don't have strong preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usally mutex is used with mutable. But this LGTM either, I don't have strong preference too
paused_ = true; | ||
return has_room.Then([this] { ResumeIfNeeded(); }); | ||
} else { | ||
ResumeIfNeeded(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is because when has_room
is finished when paused_
, resume_callback_
is not called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If has_room
is finished then we can unpause if there are no other tasks because there is room for another batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I ask a stupid question that why has_room.Then
not being called in this scenerio? In ThrottledAsyncTaskSchedulerImpl::ContinueTasks()
, wouldn't it trigger the callback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh because it's paused...I got to understand this. So we don't "resume" enough, which causing new tasks didn't being consumed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as a fixing here, seems currently we don't understand why this happens? |
Thanks for the review. I do understand why this was happening.
|
I get to understand why "pause" "resume" is important. But I remain a point that don't understand. When throttle and |
A task is "running" even when it is blocked on backpressure. Since max-running-task is 1 then Release/ContinueTasks won't be called until the |
So, actually this patch making "resume" more strict in dataset writer scenerio |
Will merge this in friday if no negative comments |
Yes, we want to resume less frequently 👍
Thanks |
} | ||
} | ||
if (needs_resume) { | ||
paused_ = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be done with the mutex acquired or are all accesses to paused_
done from the same thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, test-ubuntu-20.04-cpp-thread-sanitizer
passed at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All access to paused_
is done from a single "logical thread". write_tasks_
is a scheduler with max capacity of 1 and so the items submitted to it will never run in parallel (though they may be run on different OS threads).
@github-actions crossbow submit -g cpp |
This comment was marked as outdated.
This comment was marked as outdated.
0dc3a3d
to
fdb625b
Compare
@github-actions crossbow submit -g cpp |
I rebased for CI fixes. |
Revision: fdb625b Submitted crossbow builds: ursacomputing/crossbow @ actions-af079e8fb6 |
@ursabot please benchmark |
Benchmark runs are scheduled for commit fdb625b. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. |
Thanks for your patience. Conbench analyzed the 7 benchmarking runs that have been run so far on PR commit fdb625b. There were 12 benchmark results indicating a performance regression:
The full Conbench report has more details. |
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 640c101. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 6 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…ataset writer (apache#40722) ### Rationale for this change The dataset writer would fire the resume callback as soon as the underlying dataset writer's queues freed up, even if there were pending tasks. Backpressure is not applied immediately and so a few tasks will always trickle in. If backpressure is pausing and then resuming frequently this can lead to a buildup of pending tasks and uncontrolled memory growth. ### What changes are included in this PR? The resume callback is not called until all pending write tasks have completed. ### Are these changes tested? There is quite an extensive set of tests for the dataset writer already and they continue to pass. I ran them on repeat, with and without stress, and did not see any issues. However, the underlying problem (dataset writer can have uncontrolled memory growth) is still not tested as it is quite difficult to test. I was able to run the setup described in the issue to reproduce the issue. With this fix the repartitioning task completes for me. ### Are there any user-facing changes? No * GitHub Issue: apache#40224 Authored-by: Weston Pace <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…ataset writer (apache#40722) ### Rationale for this change The dataset writer would fire the resume callback as soon as the underlying dataset writer's queues freed up, even if there were pending tasks. Backpressure is not applied immediately and so a few tasks will always trickle in. If backpressure is pausing and then resuming frequently this can lead to a buildup of pending tasks and uncontrolled memory growth. ### What changes are included in this PR? The resume callback is not called until all pending write tasks have completed. ### Are these changes tested? There is quite an extensive set of tests for the dataset writer already and they continue to pass. I ran them on repeat, with and without stress, and did not see any issues. However, the underlying problem (dataset writer can have uncontrolled memory growth) is still not tested as it is quite difficult to test. I was able to run the setup described in the issue to reproduce the issue. With this fix the repartitioning task completes for me. ### Are there any user-facing changes? No * GitHub Issue: apache#40224 Authored-by: Weston Pace <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Rationale for this change
The dataset writer would fire the resume callback as soon as the underlying dataset writer's queues freed up, even if there were pending tasks. Backpressure is not applied immediately and so a few tasks will always trickle in. If backpressure is pausing and then resuming frequently this can lead to a buildup of pending tasks and uncontrolled memory growth.
What changes are included in this PR?
The resume callback is not called until all pending write tasks have completed.
Are these changes tested?
There is quite an extensive set of tests for the dataset writer already and they continue to pass. I ran them on repeat, with and without stress, and did not see any issues.
However, the underlying problem (dataset writer can have uncontrolled memory growth) is still not tested as it is quite difficult to test. I was able to run the setup described in the issue to reproduce the issue. With this fix the repartitioning task completes for me.
Are there any user-facing changes?
No