proc_exit and traps do not stop thread executing blocking instructions #1910

eloparco · 2023-01-23T09:56:48Z

With #1889, proc_exit and traps are properly propagated to all the threads in the process. Once a thread traps or calls proc_exit, all the other threads are stopped.

This is not true if the thread is executing a blocking operation (e.g. sleep, atomic.wait). In that case, the blocking operation is not interrupted by the exception propagation process.

Partial discussion here: #1869 (comment)

The text was updated successfully, but these errors were encountered:

eloparco · 2023-01-24T14:01:41Z

@wenyongh @xujuntwt95329 @yamt after the initial discussion in #1869 (comment), any suggestion on the possible implementation of such a mechanism?
Initially I was thinking of using sigsetjmp/siglongjmp (to resume from here), but then I would need to keep track of sigjmp_bufs for each thread and also I don't know how portable it is for other non-linux platforms.
Otherwise, I'm not sure on what we can do in the signal handlers to stop the threads gracefully (ideally we'd like move to the next instruction after sleep/atomic_wait so that the exception flag can be read to stop the execution).

loganek · 2023-01-24T14:48:50Z

I guess another approach would be to make all operations non-blocking in the host, but I'm not sure how much effort would that require (e.g. for WASI calls). Also, it might be problematic for user-defined native methods, so I think signals might be the way to go.

// edit
We just had a discussion with @eloparco about the approach, he'll update the ticket with the notes.

eloparco · 2023-01-24T18:21:26Z

Two approaches:

make all operations non-blocking in the runtime
[-] additional work every time a new blocking function is introduced
[-] it doesn't work with user-defined functions
[+] it integrates well with current exception handling mechanism
progress:
- fixed terminating stale threads on trap/proc_exit #1929 (atomic.wait)
- modified poll_oneoff to make it interruptible #1951 (poll_oneoff)
through signal handling: when propagating the exception to other threads, a signal is sent to those threads to stop their execution and return to the runtime; it requires the usage of setjmp and longjmp, similar to what is done with HW bound checks.
[-] portability problem because of setjmp/longjmp (e.g. not available on Windows)
[+] no additional work required if new blocking functions are introduced
[+] works well with user-defined functions
progress:
- ~~feat: interrupt blocking instructions using signals #1946~~
- ~~Refactor interrupting blocking threads #1948~~
- ~~TODO: resource deallocation after exception~~
- Implement async termination of blocking thread #2516

yamt · 2023-01-25T06:49:44Z

it wouldn't work with user-defined functions

in either approaches, i guess user-defined functions need to be adapted anyway.
eg. resource cleanup on termination.

maybe pthread cancellation is another alternative to consider for posix-like platforms?

eloparco · 2023-01-25T08:59:36Z

in either approaches, i guess user-defined functions need to be adapted anyway.
eg. resource cleanup on termination.

Why is that? If we use signals, the thread running the user-defined function would be stopped.

maybe pthread cancellation is another alternative to consider for posix-like platforms?

But then we wouldn't return to the runtime after stopping a (possibly blocking) instruction. It may even be fine for spawn threads, but what if it's the main thread?

loganek · 2023-01-25T09:25:34Z

I wonder what's the expected behavior of proc_exit? From what I see in the implementations, it just kills the process and doesn't allow for any resource cleanup (unlike C exit, which does a bit more before exiting, e.g. calling atexit callbacks). If that's the case, I don't think we have to worry about resource cleanup?

yamt · 2023-01-30T08:40:28Z

in either approaches, i guess user-defined functions need to be adapted anyway.
eg. resource cleanup on termination.

Why is that? If we use signals, the thread running the user-defined function would be stopped.

for example, if a user-defined function uses malloc(), it somehow needs to catch the signal to free() it.

maybe pthread cancellation is another alternative to consider for posix-like platforms?

But then we wouldn't return to the runtime after stopping a (possibly blocking) instruction. It may even be fine for spawn threads, but what if it's the main thread?

sure. probably you're right it's even trickier than signals to use for this purpose.

eloparco · 2023-01-31T09:28:33Z

for example, if a user-defined function uses malloc(), it somehow needs to catch the signal to free() it.

how would you address that? not sure if there's something we can do once the user-defined function is interrupted

yamt · 2023-01-31T11:12:04Z

for example, if a user-defined function uses malloc(), it somehow needs to catch the signal to free() it.

how would you address that? not sure if there's something we can do once the user-defined function is interrupted

for example, we can make the user-defined functions deal with interruption by themselves.

loganek · 2023-01-31T13:28:28Z

for example, if a user-defined function uses malloc(), it somehow needs to catch the signal to free() it.

how would you address that? not sure if there's something we can do once the user-defined function is interrupted

for example, we can make the user-defined functions deal with interruption by themselves.

I wonder if we actually have to provide that capability; whereas C allows registering callback for exit() (using atexit), I don't think there's equivalent for traps in either posix or C. If really needed, we can provide atexit-like functionality for both traps and proc_exit, for native user-defined functions, but I don't know how critical it is to have it right now.

yamt · 2023-01-31T16:06:02Z

for example, if a user-defined function uses malloc(), it somehow needs to catch the signal to free() it.

how would you address that? not sure if there's something we can do once the user-defined function is interrupted

for example, we can make the user-defined functions deal with interruption by themselves.

I wonder if we actually have to provide that capability; whereas C allows registering callback for exit() (using atexit), I don't think there's equivalent for traps in either posix or C. If really needed, we can provide atexit-like functionality for both traps and proc_exit, for native user-defined functions, but I don't know how critical it is to have it right now.

for embedders like iwasm command, maybe it's enough to call host exit() and let the host os terminate threads.

on the other hand, wamr, as a library, should provide a way to clean up associated resources.
termination of wasm instances doesn't necessarily involve termination of the host process.

loganek · 2023-01-31T17:33:17Z

for example, if a user-defined function uses malloc(), it somehow needs to catch the signal to free() it.

how would you address that? not sure if there's something we can do once the user-defined function is interrupted

for example, we can make the user-defined functions deal with interruption by themselves.

I wonder if we actually have to provide that capability; whereas C allows registering callback for exit() (using atexit), I don't think there's equivalent for traps in either posix or C. If really needed, we can provide atexit-like functionality for both traps and proc_exit, for native user-defined functions, but I don't know how critical it is to have it right now.

for embedders like iwasm command, maybe it's enough to call host exit() and let the host os terminate threads.

on the other hand, wamr, as a library, should provide a way to clean up associated resources. termination of wasm instances doesn't necessarily involve termination of the host process.

I didn't mean to call exit() and close the host. What I meant is that maybe we don't have to worry about user-defined functions, and we can use signals and interruptions. Embedders, if needed, can handle resource cleanup within the process after the runtime finalizes.

yamt · 2023-02-01T06:05:48Z

for example, if a user-defined function uses malloc(), it somehow needs to catch the signal to free() it.

how would you address that? not sure if there's something we can do once the user-defined function is interrupted

for example, we can make the user-defined functions deal with interruption by themselves.

I wonder if we actually have to provide that capability; whereas C allows registering callback for exit() (using atexit), I don't think there's equivalent for traps in either posix or C. If really needed, we can provide atexit-like functionality for both traps and proc_exit, for native user-defined functions, but I don't know how critical it is to have it right now.

for embedders like iwasm command, maybe it's enough to call host exit() and let the host os terminate threads.
on the other hand, wamr, as a library, should provide a way to clean up associated resources. termination of wasm instances doesn't necessarily involve termination of the host process.

I didn't mean to call exit() and close the host. What I meant is that maybe we don't have to worry about user-defined functions, and we can use signals and interruptions. Embedders, if needed, can handle resource cleanup within the process after the runtime finalizes.

it's certainly possible to code host functions that way.

however, it isn't how we code host functions right now. (including the ones within wamr tree like wasi)
so, most of host functions need to be adapted anyway.
also, my honest impression is that the approach is too difficult for average programmers around me.

in other words, i prefer "if a host function doesn't implement async termination properly, it can't be async terminated (eg. possibly block forever)" over "if a host function doesn't implement async termination properly, it might cause problems like resource leak on async termination."

loganek · 2023-02-01T22:27:53Z

In that case I guess this could be a compilation/runtime flag. We worked in parallel on making the wait instruction interruptable: https://github.com/bytecodealliance/wasm-micro-runtime/pull/1929/files

yamt · 2023-02-03T05:51:26Z

In that case I guess this could be a compilation/runtime flag. We worked in parallel on making the wait instruction interruptable: https://github.com/bytecodealliance/wasm-micro-runtime/pull/1929/files

i guess per host function flag makes more sense than a compilation/runtime flag.

loganek · 2023-02-03T10:17:00Z

i guess per host function flag makes more sense than a compilation/runtime flag.

What do you mean by host function flag?

yamt · 2023-02-06T08:44:05Z

i guess per host function flag makes more sense than a compilation/runtime flag.

What do you mean by host function flag?

eg. add a member to WASMFunctionImport to specify if it's safe to terminate the function that way.

loganek · 2023-02-07T15:24:06Z

From #1869 (comment):

for another runtime i made all i/o non-blocking to handle thread termination requests. but i don't think it's an appropriate approach for WAMR.

I wonder if that's something we could actually implement that at least for the poll_oneof() syscall - that function is being used for things like sleep in wasi-libc so even though not all blocking operations are covered, we can already satisfy a lot of usecases for platforms where signal-based implementation is not available. @yamt @wenyongh I'd like to know your thoughts and concerns about that.

yamt · 2023-02-07T15:51:23Z

From #1869 (comment):

for another runtime i made all i/o non-blocking to handle thread termination requests. but i don't think it's an appropriate approach for WAMR.

I wonder if that's something we could actually implement that at least for the poll_oneof() syscall - that function is being used for things like sleep in wasi-libc so even though not all blocking operations are covered, we can already satisfy a lot of usecases for platforms where signal-based implementation is not available. @yamt @wenyongh I'd like to know your thoughts and concerns about that.

do you mean to make these sleep-like functionalities wake up periodically internally so that it can check the extra termination conditions?

i suppose it's the only choice for platforms where signal-like functionalities are not available. (that's why i did it that way for another runtime, where one of its main target doesn't have signals.)

loganek · 2023-02-07T17:43:15Z

do you mean to make these sleep-like functionalities wake up periodically internally so that it can check the extra termination conditions?

Yes, I was a bit confused when you said it's not approperiate approach for WAMR, but looks like it was just a misunderstanding, and we're on the same page. We'll implement that in addition to signals then.

wenyongh · 2023-02-08T02:53:44Z

From #1869 (comment):

for another runtime i made all i/o non-blocking to handle thread termination requests. but i don't think it's an appropriate approach for WAMR.

I wonder if that's something we could actually implement that at least for the poll_oneof() syscall - that function is being used for things like sleep in wasi-libc so even though not all blocking operations are covered, we can already satisfy a lot of usecases for platforms where signal-based implementation is not available. @yamt @wenyongh I'd like to know your thoughts and concerns about that.

do you mean to make these sleep-like functionalities wake up periodically internally so that it can check the extra termination conditions?

i suppose it's the only choice for platforms where signal-like functionalities are not available. (that's why i did it that way for another runtime, where one of its main target doesn't have signals.)

Sounds reasonable, for libc-wasi and internal code, we might change the implementation.

But for user developed native library, it makes me a little confused: should runtime interrupt the thread running into native API by signal-like functionality, will it be not so friendly to developer? Should the developer be responsible for his behaviors?
Or we just make this an option, well document it, and let developer choose whether to enable it or not?

yamt · 2023-02-08T05:22:12Z

do you mean to make these sleep-like functionalities wake up periodically internally so that it can check the extra termination conditions?

Yes, I was a bit confused when you said it's not approperiate approach for WAMR, but looks like it was just a misunderstanding, and we're on the same page. We'll implement that in addition to signals then.

i said so about making "all i/o non-blocking".
but what you are talking about is something less extreme, right?

yamt · 2023-02-08T05:43:07Z

From #1869 (comment):

for another runtime i made all i/o non-blocking to handle thread termination requests. but i don't think it's an appropriate approach for WAMR.

I wonder if that's something we could actually implement that at least for the poll_oneof() syscall - that function is being used for things like sleep in wasi-libc so even though not all blocking operations are covered, we can already satisfy a lot of usecases for platforms where signal-based implementation is not available. @yamt @wenyongh I'd like to know your thoughts and concerns about that.

do you mean to make these sleep-like functionalities wake up periodically internally so that it can check the extra termination conditions?
i suppose it's the only choice for platforms where signal-like functionalities are not available. (that's why i did it that way for another runtime, where one of its main target doesn't have signals.)

Sounds reasonable, for libc-wasi and internal code, we might change the implementation.

But for user developed native library, it makes me a little confused: should runtime interrupt the thread running into native API by signal-like functionality, will it be not so friendly to developer? Should the developer be responsible for his behaviors? Or we just make this an option, well document it, and let developer choose whether to enable it or not?

it's reasonable to make it an option.

a developer can make his functions signal-termination-safe and mark it so in the corresponding NativeSymbols. the runtime can terminate them with a signal where it's available.
besides that, a developer can implement graceful termination mechanism like periodic checks in his functions. we can provide an api for this. (eg. should_terminate())
otherwise, a function can't be terminated. (the default behavior)

loganek · 2023-02-08T09:32:34Z

but what you are talking about is something less extreme, right?

Yes, I'm only talking about the poll_oneof function (at least for now)

Or we just make this an option, well document it, and let developer choose whether to enable it or not?

Yeah, I think we discussed it a few comments above; make it optional sounds reasonable to me.

yamt · 2023-07-06T03:23:17Z

is anyone still working on this?

eloparco · 2023-07-06T13:18:42Z

is anyone still working on this?

Not currently, on my side at least. We started the effort on this branch bytecodealliance:dev/interrupt_block_insn.
So, at the moment, the new tests from the proposal wouldn't pass.

yamt · 2023-07-07T04:44:50Z

is anyone still working on this?

Not currently, on my side at least. We started the effort on this branch bytecodealliance:dev/interrupt_block_insn. So, at the moment, the new tests from the proposal wouldn't pass.

ok. thank you for an update.
are you (or someone) still willing/planning to work on this?

eloparco · 2023-07-07T08:14:23Z

are you (or someone) still willing/planning to work on this?

Unfortunately, not in the short term, I don't have any time scheduled to work on it.
Feel free to pick it up if you want.

Send a signal whose handler is no-op to a blocking thread to wake up the blocking syscall with either EINTR equivalent or partial success. Unlike the approach taken in the `dev/interrupt_block_insn` branch (that is, signal + longjmp similarly to `OS_ENABLE_HW_BOUND_CHECK`), this PR does not use longjmp because: * longjmp from signal handler doesn't work on nuttx refer to apache/nuttx#10326 * the singal+longjmp approach may be too difficult for average programmers who might implement host functions to deal with See also #1910

Send a signal whose handler is no-op to a blocking thread to wake up the blocking syscall with either EINTR equivalent or partial success. Unlike the approach taken in the `dev/interrupt_block_insn` branch (that is, signal + longjmp similarly to `OS_ENABLE_HW_BOUND_CHECK`), this PR does not use longjmp because: * longjmp from signal handler doesn't work on nuttx refer to apache/nuttx#10326 * the singal+longjmp approach may be too difficult for average programmers who might implement host functions to deal with See also bytecodealliance#1910

loganek mentioned this issue Jan 23, 2023

WASI threads support #1790

Closed

19 tasks

eloparco mentioned this issue Jan 29, 2023

Interrupt blocking instructions #1921

Closed

eloparco mentioned this issue Feb 2, 2023

Allow interrupting blocking instructions #1930

Closed

yamt mentioned this issue Sep 15, 2023

Implement async termination of blocking thread #2516

Merged

eloparco closed this as completed Sep 20, 2023

wenyongh mentioned this issue Sep 21, 2023

Implement async termination of blocking thread (#2516) wenyongh/wasm-micro-runtime#815

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proc_exit and traps do not stop thread executing blocking instructions #1910

proc_exit and traps do not stop thread executing blocking instructions #1910

eloparco commented Jan 23, 2023

eloparco commented Jan 24, 2023

loganek commented Jan 24, 2023 •

edited

Loading

eloparco commented Jan 24, 2023 •

edited

Loading

yamt commented Jan 25, 2023

eloparco commented Jan 25, 2023

loganek commented Jan 25, 2023

yamt commented Jan 30, 2023

eloparco commented Jan 31, 2023

yamt commented Jan 31, 2023

loganek commented Jan 31, 2023

yamt commented Jan 31, 2023

loganek commented Jan 31, 2023

yamt commented Feb 1, 2023

loganek commented Feb 1, 2023 •

edited

Loading

yamt commented Feb 3, 2023

loganek commented Feb 3, 2023

yamt commented Feb 6, 2023

loganek commented Feb 7, 2023

yamt commented Feb 7, 2023

loganek commented Feb 7, 2023

wenyongh commented Feb 8, 2023 •

edited

Loading

yamt commented Feb 8, 2023

yamt commented Feb 8, 2023

loganek commented Feb 8, 2023

yamt commented Jul 6, 2023

eloparco commented Jul 6, 2023

yamt commented Jul 7, 2023

eloparco commented Jul 7, 2023

proc_exit and traps do not stop thread executing blocking instructions #1910

proc_exit and traps do not stop thread executing blocking instructions #1910

Comments

eloparco commented Jan 23, 2023

eloparco commented Jan 24, 2023

loganek commented Jan 24, 2023 • edited Loading

eloparco commented Jan 24, 2023 • edited Loading

yamt commented Jan 25, 2023

eloparco commented Jan 25, 2023

loganek commented Jan 25, 2023

yamt commented Jan 30, 2023

eloparco commented Jan 31, 2023

yamt commented Jan 31, 2023

loganek commented Jan 31, 2023

yamt commented Jan 31, 2023

loganek commented Jan 31, 2023

yamt commented Feb 1, 2023

loganek commented Feb 1, 2023 • edited Loading

yamt commented Feb 3, 2023

loganek commented Feb 3, 2023

yamt commented Feb 6, 2023

loganek commented Feb 7, 2023

yamt commented Feb 7, 2023

loganek commented Feb 7, 2023

wenyongh commented Feb 8, 2023 • edited Loading

yamt commented Feb 8, 2023

yamt commented Feb 8, 2023

loganek commented Feb 8, 2023

yamt commented Jul 6, 2023

eloparco commented Jul 6, 2023

yamt commented Jul 7, 2023

eloparco commented Jul 7, 2023

loganek commented Jan 24, 2023 •

edited

Loading

eloparco commented Jan 24, 2023 •

edited

Loading

loganek commented Feb 1, 2023 •

edited

Loading

wenyongh commented Feb 8, 2023 •

edited

Loading