-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] fix illegal memory accesses for uneven rows where NITEMS > 1 #3469
Conversation
rerun tests (spdlog compilation errors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to other comments: do these changes affect performance?
for (int j = 0; j < NITEMS; ++j) curr[j] = 0; | ||
do { | ||
#pragma unroll | ||
for (int j = 0; j < NITEMS; ++j) { | ||
auto n = tree[curr[j]]; | ||
mask &= ~(n.is_leaf() << j); | ||
if (!n.is_leaf()) { | ||
if (mask & (1 << j)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (mask & (1 << j)) { | |
if ((mask & (1 << j)) != 0) { |
@@ -172,7 +172,8 @@ struct tree_aggregator_t { | |||
: tmp_storage(finalize_workspace) {} | |||
|
|||
__device__ __forceinline__ void accumulate( | |||
vec<NITEMS, float> single_tree_prediction, int tree) { | |||
vec<NITEMS, float> single_tree_prediction, int tree, int num_rows) { | |||
for (int i = num_rows; i < NITEMS; ++i) single_tree_prediction[i] = 0.0f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the compiler will be able to unroll this loop.
Besides, I think it would be easier to change the loop in infer_one_tree()
, this will also avoid a lot of changes:
vec<NITEMS, output_type> out; // initialized to 0
#pragma unroll
for (int j = 0; j < NITEMS; ++j) {
if (j < n_rows) out[j] = tree[curr[j]].template output<output_type>();
}
raft::myAtomicAdd(votes + single_tree_prediction[item] * NITEMS + item, | ||
1); | ||
for (int item = 0; item < NITEMS; ++item) { | ||
if (item < num_rows) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's always enough shared memory allocated in votes
for all NITEMS
, independent of how many rows are actually processed. So no need for a conditional here.
Should be superseded by #3480 |
Closed by #3480 |
the bug results in memory corruption, which sometimes overwrites useful memory and triggers a test. only compute-sanitizer catches the illegal access right now, so no point adjusting array size.