-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-41334: [C++][Acero] Use per-node basis temp vector stack to mitigate overflow #41335
Changes from all commits
936bef1
7d13d51
500c81e
19d0246
f8fc7f3
0e756d8
a48ed90
5ff1b29
8b1d9cd
ccc3366
a15b6de
7fe76d5
cf45ef9
7b397fb
2d6f053
a3622d9
94be92d
3b3c771
dcc3b7a
4821398
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -48,6 +48,16 @@ class ARROW_EXPORT Hashing32 { | |||||||||||||||||||||||||||||||
static void HashMultiColumn(const std::vector<KeyColumnArray>& cols, LightContext* ctx, | ||||||||||||||||||||||||||||||||
uint32_t* out_hash); | ||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||
// Clarify the max temp stack usage for HashBatch, which might be necessary for the | ||||||||||||||||||||||||||||||||
// caller to be aware of at compile time to reserve enough stack size in advance. The | ||||||||||||||||||||||||||||||||
// HashBatch implementation uses one uint32 temp vector as a buffer for hash, one uint16 | ||||||||||||||||||||||||||||||||
// temp vector as a buffer for null indices and one uint32 temp vector as a buffer for | ||||||||||||||||||||||||||||||||
// null hash, all are of size kMiniBatchLength. Plus extra kMiniBatchLength to cope with | ||||||||||||||||||||||||||||||||
// stack padding and aligning. | ||||||||||||||||||||||||||||||||
static constexpr auto kHashBatchTempStackUsage = | ||||||||||||||||||||||||||||||||
(sizeof(uint32_t) + sizeof(uint16_t) + sizeof(uint32_t) + /*extra=*/1) * | ||||||||||||||||||||||||||||||||
util::MiniBatch::kMiniBatchLength; | ||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||
static Status HashBatch(const ExecBatch& key_batch, uint32_t* hashes, | ||||||||||||||||||||||||||||||||
std::vector<KeyColumnArray>& column_arrays, | ||||||||||||||||||||||||||||||||
int64_t hardware_flags, util::TempVectorStack* temp_stack, | ||||||||||||||||||||||||||||||||
|
@@ -161,6 +171,15 @@ class ARROW_EXPORT Hashing64 { | |||||||||||||||||||||||||||||||
static void HashMultiColumn(const std::vector<KeyColumnArray>& cols, LightContext* ctx, | ||||||||||||||||||||||||||||||||
uint64_t* hashes); | ||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||
// Clarify the max temp stack usage for HashBatch, which might be necessary for the | ||||||||||||||||||||||||||||||||
// caller to be aware of at compile time to reserve enough stack size in advance. The | ||||||||||||||||||||||||||||||||
// HashBatch implementation uses one uint16 temp vector as a buffer for null indices and | ||||||||||||||||||||||||||||||||
// one uint64 temp vector as a buffer for null hash, all are of size kMiniBatchLength. | ||||||||||||||||||||||||||||||||
// Plus extra kMiniBatchLength to cope with stack padding and aligning. | ||||||||||||||||||||||||||||||||
static constexpr auto kHashBatchTempStackUsage = | ||||||||||||||||||||||||||||||||
(sizeof(uint16_t) + sizeof(uint64_t) + /*extra=*/1) * | ||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Minor nit: but why does this constant have 2 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is something else I should've made the comment more verbose. Each arrow/cpp/src/arrow/compute/key_hash_internal.cc Lines 830 to 835 in a715ea0
The previous three- arrow/cpp/src/arrow/compute/key_hash_internal.cc Lines 387 to 395 in a715ea0
I'll update the comment later. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. More comments added. |
||||||||||||||||||||||||||||||||
util::MiniBatch::kMiniBatchLength; | ||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||
static Status HashBatch(const ExecBatch& key_batch, uint64_t* hashes, | ||||||||||||||||||||||||||||||||
std::vector<KeyColumnArray>& column_arrays, | ||||||||||||||||||||||||||||||||
int64_t hardware_flags, util::TempVectorStack* temp_stack, | ||||||||||||||||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this was the last place in a non-internal header file where the full definition of
TempVectorStack
is required, so, perhaps we can now moveTempVectorStack
to aarrow/compute/util_internal.h
header file?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very thoughtful suggestion. Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed. And reduced some more (maybe unrelated) includings.