-
Notifications
You must be signed in to change notification settings - Fork 625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixed terminating stale threads on trap/proc_exit #1929
Conversation
/** | ||
* create a copy of wait_map with no lock | ||
* this allows invoking methods in the callback | ||
* which require acquiring lock on the original map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see there's any lock in the notify_stale_threads
callback. Should we rather have a lock on the shared_memory_list_lock
mutex at the beginning of the function and release it at the end? Otherwise the map could be modified in the other thread making the traversal corrupted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the wasm_runtime_atomic_notify
step in callback calls acquire_wait_info
which tries to acquire shared_memory_list_lock
, so I guess encapsulating it with that will cause issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HashMap
doesn't duplicate the key/values pairs, so wait_map_without_lock
will point to the same elements of wait_map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you at least want to lock while you copy hashmaps. Also, what happens if after making a copy a new waiter is added to the map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@loganek While copying the original map, the original map already acquires a lock as it traverses. As inserts happen in the copy map, every insert locks the copy map. So I think the copy should be consistent? If you meant wrapping copy with shared_memory_list_lock
-> addressed that.
What could be concerning here is that during the traversal of copy map, the original map could be updated. If I put a lock on the original map during traversal, the execution gets stuck because the callback steps in notify_stale_threads
try to put a lock on the original map. (This is exactly why I had to create a copy). This might lead to some entries (newly added ones) in the original map not catered to at that point of time, but would be eventually handled, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either way I'm still not sure it would be safe since you're not cloning the map but copying the pointers iiuc
closing and reopening to reinitiate the build |
c5e57ca
to
64abe09
Compare
64abe09
to
22ba795
Compare
{ | ||
AtomicWaitAddressArgs *data = (AtomicWaitAddressArgs *)user_data; | ||
if (data->len > 0) { | ||
if (!(data->addr = wasm_runtime_realloc(data->addr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When to free the memory? And why do nothing except LOG_ERROR when malloc/realloc memory failed?
memset(args, 0, sizeof(*args)); | ||
os_mutex_lock(&shared_memory_list_lock); | ||
// create list of addresses | ||
bh_hash_map_traverse(wait_map, create_list_of_waiter_addresses, args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about traversing two times: the first time to get the total element count of wait_map, use it to allocate memory for AtomicWaitAddressArgs *args
with size == offsetof(AtomicWaitAddressArgs, addr) + sizeof(void *) * total_elem_count
, and then traverse the second time without allocating memory?
static void
xxx_callback(void *key, void *value, void *p_total_elem_count)
{
*(uint32 *)p_total_elem_count = *(uint32 *)p_total_elem_count + 1;
}
os_mutex_lock(&shared_memory_list_lock);
total_elem_count = 0;
bh_hash_map_traverse(wait_map, xxx_callback, (void *)&total_elem_count);
allocate memory
bh_hash_map_traverse(wait_map, ..., args); /* set each data->addr[i] */
os_mutex_unlock(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just wondering if there is any specific advantage of traversing two times? is it to ensure the final traversal of data->addr
is safe i.e. we don't go out of bounds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of that is to avoid multiple reallocations - knowing the size in advance will let you make only one allocation.
22ba795
to
aa481de
Compare
aa481de
to
bba90ce
Compare
bba90ce
to
ee62523
Compare
8d79672
to
6119dec
Compare
6119dec
to
7cd2e0b
Compare
Does this PR work in AOT mode? Also, can you try it with the sample thread_termination.c after replacing
And try with
Just to make sure that the base cases are covered. |
7cd2e0b
to
5424205
Compare
5424205
to
ec33f47
Compare
This is to terminate suspended threads in case an atomic wait occurs with a huge or indefinite (-1) timeout, followed by a proc exit or trap.
This is to terminate suspended threads in case an atomic wait occurs with a huge or indefinite (-1) timeout, followed by a proc exit or trap.