-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Mono] Race in init_method when using LLVM AOT. #75088
[Mono] Race in init_method when using LLVM AOT. #75088
Conversation
@vargaz I also looked a little at the placement of:
happening after we set:
is there a strict ordering that needs to occur between these two? Order is currently enforced due to |
So the problem was a race between setting got [...] and calling init_aotconst () ? |
I think register_jump_target_got_slot () could be moved earlier as well. |
Correct. |
OK, then I move that to happen just before the memory barrier as well, then the assignment of: got [got_slots [pindex]] = addr; will have release semantics for all the writes above the memory barrier (including work done in register_jump_target_got_slot) |
When using LLVM AOT codegen, init_method updates two GOT slots. These slots are initialized as part of init_method, but there is a race between initialization of the two slots. Current implementation can have two threads running init_method for the same method, but as soon as: [got_slots [pindex]] = addr store is visible, it will trigger other threads to return back from init_method, but since that could happen before the corresponding LLVM AOT const slot is set, second thread will return to method calling init_method, load the LLVM aot const, and crash when trying to use it (since its still NULL). This crash is very rare but have been identified on x86/x64 CPU's, when one thread is either preempted between updating regular GOT slot and LLVM GOT slot or store into LLVM GOT slot gets delayed in store buffer. I have also been able to emulate the scenario in debugger, triggering the issue and crashing in the method loading from LLVM aot const slot. Fix change order of updates and make sure the update of LLVM aot const slot happens before memory_barrier, since: got [got_slots [pindex]] = addr; have release semantics in relation to addr and update of LLVM aot const slot. Fix also add acquire/release semantics for ji->type in init_method since it is used to guard if a thread ignores a patch or not and it should not be re-ordered with previous stores, since it can cause similar race conditions with updated slots.
370e2bb
to
d541a32
Compare
WasmBuildTests CI lanes fails on other PR's as well, so unrelated. |
/backport to release/7.0 |
Started backporting to release/7.0: https://github.com/dotnet/runtime/actions/runs/6406322696 |
When using LLVM AOT codegen,
init_method
updates two GOT slots. These slots are initialized as part ofinit_method
,but there is a race between initialization of the two slots. Current implementation can have two threads running
init_method
for the same method, but as soon as:[got_slots [pindex]] = addr
store is visible, it will trigger other threads to return back from
init_method
, but since that could happen before the corresponding LLVM AOT const slot is set, second thread will return to method callinginit_method
, load the LLVM aot const, and crash when trying to use it (since its still NULL).This crash is very rare but have been identified on x86/x64 CPU's, when one thread is either preempted between updating regular GOT slot and LLVM aot const slot or store into LLVM aot const slot gets delayed in store buffer. I have also been able to emulate the scenario in debugger, triggering the issue and crashing in the method loading from LLVM aot const slot, like a vtable and then trying to allocate an object using a NULL vtable.
Fix change order of updates and make sure the update of LLVM aot const slot happens before
mono_memory_barrier
, since:got [got_slots [pindex]] = addr;
have release semantics in relation to
addr
and update ofamodule->info.llvm_init_aotconst
. Fix also add acquire/release semantics forji->type
ininit_method
since it is used to guard if a thread ignores a patch or not and it should not be re-ordered with previous stores (on weak memory architectures), since it can cause similar race conditions with updated slots.