-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
races in ARM lockless data structure reads #2502
Comments
Fixes two races with shared ibt tables: + Adding a new table entry must write the start_pc before the tag. This is accomplished with a new ENTRY_SET_TO_ENTRY hashtablex.h optional specifier. For ARM #2502 a new MEMORY_STORE_BARRIER macro is added. + Resizing a table must not clear the tags in the old table to avoid losing the tag on the target_delete ibl path. Adds a test api.ibl-stress which uses the DR IR to synthetically construct thousands of basic blocks with indirect branches betweent them. To make the test work, relaxes several is-on-stack checks to support pre-building basic blocks (#2463) from generated code or other locations not known prior to starting the application. Issue: #3098, #2502, #2463 Fixes #3098
Fixes two races with shared ibt tables: + Adding a new table entry must write the start_pc before the tag. This is accomplished with a new ENTRY_SET_TO_ENTRY hashtablex.h optional specifier. For ARM #2502 a new MEMORY_STORE_BARRIER macro is added. + Resizing a table must not clear the tags in the old table to avoid losing the tag on the target_delete ibl path. Adds a test api.ibl-stress which uses the DR IR to synthetically construct thousands of basic blocks with indirect branches betweent them. To make the test work, relaxes several is-on-stack checks to support pre-building basic blocks (#2463) from generated code or other locations not known prior to starting the application. Issue: #3098, #2502, #2463 Fixes #3098
Adds 32-bit and 64-bit load and store atomics to core DR and exports them to the client API. Adds missing barriers to the existing store atomic macros on both ARM and AArch64. Adds a sanity test that at least exercises the functions. Adds the test sequence to client.thread and client.tls (only client.tls is currently enabled on AArchXX). Uses dr_atomic_load* in drwrap in place of a previously overkill dr_atomic_add*. These will also help with #2502. The next step is to add gencode versions of these, either to the core or drx. Issue: #4215, #2502
Adds 32-bit and 64-bit load and store atomics to core DR and exports them to the client API. Adds missing barriers to the existing store atomic macros on both ARM and AArch64. Adds a sanity test that at least exercises the functions. Adds the test sequence to client.thread and client.tls (only client.tls is currently enabled on AArchXX). Uses dr_atomic_load* in drwrap in place of a previously overkill dr_atomic_add*. These will also help with #2502. The next step is to add gencode versions of these, either to the core or drx. Issue: #4215, #2502
For update_lookuptable_tls: Adds a store-release to the lookuptable store and a load-acquire to the IBL generated code for both ARM and AArch64, as there is no lighter-weight way to ensure the mask and table stores are seen in the proper order. For flushtime_global: made all readers and the increment use store-release semantics. For safely_nullify_tables: leaving as weak stores since timing and ordering does not matter there. Issue: #2502
For update_lookuptable_tls: Adds a store-release to the lookuptable store and a load-acquire to the IBL generated code for both ARM and AArch64, as there is no lighter-weight way to ensure the mask and table stores are seen in the proper order. For flushtime_global: made all readers and the increment use store-release semantics. For safely_nullify_tables: leaving as weak stores since timing and ordering does not matter there. Issue: #2502
I'm adding a related action item here:
|
Probably best to leave anything that's generating explicit barriers as DMB ISH. On Armv8 (including all AArch64) DMB ISHLD should be sufficient as an acquire barrier, but this might not be true on pre-v8 systems. And on v8 it's best to use LDAR anyway. |
There are more variables that may need stronger stores and loads:
|
I caught and diagnosed a plain-DR hang here: #4928 (comment) Looking through our lock routines I see a number of other |
Adds a missing load-acquire in a loop in ATOMIC_MAX that causes hangs in RSTATS_ADD_PEAK. Tested on the client.annotation-concurrency test on AArch64 where without this fix it hangs under release-build plain DR every ~100 runs. With the fix, there is no hang in 20K runs for either plain DR or drcachesim -offline. Issue: #2502, #4928, #5366 Fixes: #5366
Adds load-acquire semantics to a number of bare loads (i.e., not inside a mutex) in DR's various lock routines. This includes a key missing load-acquire in the spinlock loop in d_r_mutex_lock_app() which might help explain some performance issues such as #4279. Since some of these are in the utils.h header, and we have ordering dependencies between utils.h and arch_exports.h, this required splitting the atomic defines out of arch/arch_exports.h into a new file arch/atomic_exports.h. Since those atomic defines use ASSERT which references dynamo_options, this further required splitting the option struct out of options.h into options_struct.h. Issue: #2502, #4928, #4279
Adds load-acquire semantics to a number of bare loads (i.e., not inside a mutex) in DR's various lock routines. This includes a key missing load-acquire in the spinlock loop in d_r_mutex_lock_app() which might help explain some performance issues such as #4279. Since some of these are in the utils.h header, and we have ordering dependencies between utils.h and arch_exports.h, this required splitting the atomic defines out of arch/arch_exports.h into a new file arch/atomic_exports.h. Since those atomic defines use ASSERT which references dynamo_options, this further required splitting the option struct out of options.h into options_struct.h. Issue: #2502, #4928, #4279
In DR we have some data structures we read without holding a lock, relying on the hardware write visibility. The design and code was created with x86 in mind and we did not do a thorough enough re-evaluation for ARM and AArch64. For ARM's memory model we need to add barriers in multiple places to ensure that writes are visible in other threads in the order we require.
Suspect data structures include:
The text was updated successfully, but these errors were encountered: