-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes a race in DeviceRunLengthEncode::NonTrivialRuns #399
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great find 🥇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for finding the race!
@@ -468,6 +468,10 @@ struct AgentRle | |||
|
|||
tile_aggregate = scan_op(tile_aggregate, temp_storage.aliasable.scan_storage.warp_aggregates.Alias()[WARP]); | |||
} | |||
|
|||
// Ensure all threads have read warp aggregates before temp_storage is repurposed in the | |||
// subsequent scatter stage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remark: the memory seems to be used by the prefix operation sooner than in the scatter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the prefix
and warp_aggregates
use different memory within the ScanStorage
compartment, whereas the ScatterAliasable
and the ScanStorage
alias one another?
// Aliasable storage layout
union Aliasable
{
struct ScanStorage
{
typename BlockDiscontinuityT::TempStorage discontinuity; // Smem needed for discontinuity detection
typename WarpScanPairs::TempStorage warp_scan[WARPS]; // Smem needed for warp-synchronous scans
Uninitialized<LengthOffsetPair[WARPS]> warp_aggregates; // Smem needed for sharing warp-wide aggregates
typename TilePrefixCallbackOpT::TempStorage prefix; // Smem needed for cooperative prefix callback
} scan_storage;
// Smem needed for input loading
typename BlockLoadT::TempStorage load;
// Aliasable layout needed for two-phase scatter
union ScatterAliasable
{
unsigned long long align;
WarpExchangePairsStorage exchange_pairs[ACTIVE_EXCHANGE_WARPS];
typename WarpExchangeOffsets::TempStorage exchange_offsets[ACTIVE_EXCHANGE_WARPS];
typename WarpExchangeLengths::TempStorage exchange_lengths[ACTIVE_EXCHANGE_WARPS];
} scatter_aliasable;
} aliasable;
Btw., the culprit really is in the code branch of the first tile:
if (tile_idx == 0)
{
// First tile
...
BlockLoadT().Load();
CTA_SYNC();
...
InitializeSelections(); // using scan_storage
WarpScanAllocations(); // using scan_storage
prefix_op(); // using scan_storage
Scatter(); // using scatter_aliasable
Description
This PR addresses #351.
WarpScanAllocations()
andScatter()
alias/repurpose the same shared memory allocation. Previously it could happen that some threads may have reached theScatter()
stage, overwriting shared memory that was yet-to-be-read by other threads still withinWarpScanAllocations()
stage. ACTA_SYNC
was added toWarpScanAllocations
to make sure other threads wouldn't race ahead to theScatter()
stage.Lines that were reading potentially corrupted data:
This fix resolves the race condition being reported by compute-sanitizer and has had more than 90k successful test runs (previously avg. failure rate was one in ~1300 runs).
Checklist