Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proc_exit and traps do not stop thread executing blocking instructions #1910

Closed
Tracked by #1790
eloparco opened this issue Jan 23, 2023 · 28 comments
Closed
Tracked by #1790

Comments

@eloparco
Copy link
Contributor

With #1889, proc_exit and traps are properly propagated to all the threads in the process. Once a thread traps or calls proc_exit, all the other threads are stopped.

This is not true if the thread is executing a blocking operation (e.g. sleep, atomic.wait). In that case, the blocking operation is not interrupted by the exception propagation process.

Partial discussion here: #1869 (comment)

@loganek loganek mentioned this issue Jan 23, 2023
19 tasks
@eloparco
Copy link
Contributor Author

@wenyongh @xujuntwt95329 @yamt after the initial discussion in #1869 (comment), any suggestion on the possible implementation of such a mechanism?
Initially I was thinking of using sigsetjmp/siglongjmp (to resume from here), but then I would need to keep track of sigjmp_bufs for each thread and also I don't know how portable it is for other non-linux platforms.
Otherwise, I'm not sure on what we can do in the signal handlers to stop the threads gracefully (ideally we'd like move to the next instruction after sleep/atomic_wait so that the exception flag can be read to stop the execution).

@loganek
Copy link
Collaborator

loganek commented Jan 24, 2023

I guess another approach would be to make all operations non-blocking in the host, but I'm not sure how much effort would that require (e.g. for WASI calls). Also, it might be problematic for user-defined native methods, so I think signals might be the way to go.

// edit
We just had a discussion with @eloparco about the approach, he'll update the ticket with the notes.

@eloparco
Copy link
Contributor Author

eloparco commented Jan 24, 2023

Two approaches:

@yamt
Copy link
Collaborator

yamt commented Jan 25, 2023

it wouldn't work with user-defined functions

in either approaches, i guess user-defined functions need to be adapted anyway.
eg. resource cleanup on termination.

maybe pthread cancellation is another alternative to consider for posix-like platforms?

@eloparco
Copy link
Contributor Author

in either approaches, i guess user-defined functions need to be adapted anyway.
eg. resource cleanup on termination.

Why is that? If we use signals, the thread running the user-defined function would be stopped.

maybe pthread cancellation is another alternative to consider for posix-like platforms?

But then we wouldn't return to the runtime after stopping a (possibly blocking) instruction. It may even be fine for spawn threads, but what if it's the main thread?

@loganek
Copy link
Collaborator

loganek commented Jan 25, 2023

I wonder what's the expected behavior of proc_exit? From what I see in the implementations, it just kills the process and doesn't allow for any resource cleanup (unlike C exit, which does a bit more before exiting, e.g. calling atexit callbacks). If that's the case, I don't think we have to worry about resource cleanup?

@yamt
Copy link
Collaborator

yamt commented Jan 30, 2023

in either approaches, i guess user-defined functions need to be adapted anyway.
eg. resource cleanup on termination.

Why is that? If we use signals, the thread running the user-defined function would be stopped.

for example, if a user-defined function uses malloc(), it somehow needs to catch the signal to free() it.

maybe pthread cancellation is another alternative to consider for posix-like platforms?

But then we wouldn't return to the runtime after stopping a (possibly blocking) instruction. It may even be fine for spawn threads, but what if it's the main thread?

sure. probably you're right it's even trickier than signals to use for this purpose.

@eloparco
Copy link
Contributor Author

for example, if a user-defined function uses malloc(), it somehow needs to catch the signal to free() it.

how would you address that? not sure if there's something we can do once the user-defined function is interrupted

@yamt
Copy link
Collaborator

yamt commented Jan 31, 2023

for example, if a user-defined function uses malloc(), it somehow needs to catch the signal to free() it.

how would you address that? not sure if there's something we can do once the user-defined function is interrupted

for example, we can make the user-defined functions deal with interruption by themselves.

@loganek
Copy link
Collaborator

loganek commented Jan 31, 2023

for example, if a user-defined function uses malloc(), it somehow needs to catch the signal to free() it.

how would you address that? not sure if there's something we can do once the user-defined function is interrupted

for example, we can make the user-defined functions deal with interruption by themselves.

I wonder if we actually have to provide that capability; whereas C allows registering callback for exit() (using atexit), I don't think there's equivalent for traps in either posix or C. If really needed, we can provide atexit-like functionality for both traps and proc_exit, for native user-defined functions, but I don't know how critical it is to have it right now.

@yamt
Copy link
Collaborator

yamt commented Jan 31, 2023

for example, if a user-defined function uses malloc(), it somehow needs to catch the signal to free() it.

how would you address that? not sure if there's something we can do once the user-defined function is interrupted

for example, we can make the user-defined functions deal with interruption by themselves.

I wonder if we actually have to provide that capability; whereas C allows registering callback for exit() (using atexit), I don't think there's equivalent for traps in either posix or C. If really needed, we can provide atexit-like functionality for both traps and proc_exit, for native user-defined functions, but I don't know how critical it is to have it right now.

for embedders like iwasm command, maybe it's enough to call host exit() and let the host os terminate threads.

on the other hand, wamr, as a library, should provide a way to clean up associated resources.
termination of wasm instances doesn't necessarily involve termination of the host process.

@loganek
Copy link
Collaborator

loganek commented Jan 31, 2023

for example, if a user-defined function uses malloc(), it somehow needs to catch the signal to free() it.

how would you address that? not sure if there's something we can do once the user-defined function is interrupted

for example, we can make the user-defined functions deal with interruption by themselves.

I wonder if we actually have to provide that capability; whereas C allows registering callback for exit() (using atexit), I don't think there's equivalent for traps in either posix or C. If really needed, we can provide atexit-like functionality for both traps and proc_exit, for native user-defined functions, but I don't know how critical it is to have it right now.

for embedders like iwasm command, maybe it's enough to call host exit() and let the host os terminate threads.

on the other hand, wamr, as a library, should provide a way to clean up associated resources. termination of wasm instances doesn't necessarily involve termination of the host process.

I didn't mean to call exit() and close the host. What I meant is that maybe we don't have to worry about user-defined functions, and we can use signals and interruptions. Embedders, if needed, can handle resource cleanup within the process after the runtime finalizes.

@yamt
Copy link
Collaborator

yamt commented Feb 1, 2023

for example, if a user-defined function uses malloc(), it somehow needs to catch the signal to free() it.

how would you address that? not sure if there's something we can do once the user-defined function is interrupted

for example, we can make the user-defined functions deal with interruption by themselves.

I wonder if we actually have to provide that capability; whereas C allows registering callback for exit() (using atexit), I don't think there's equivalent for traps in either posix or C. If really needed, we can provide atexit-like functionality for both traps and proc_exit, for native user-defined functions, but I don't know how critical it is to have it right now.

for embedders like iwasm command, maybe it's enough to call host exit() and let the host os terminate threads.
on the other hand, wamr, as a library, should provide a way to clean up associated resources. termination of wasm instances doesn't necessarily involve termination of the host process.

I didn't mean to call exit() and close the host. What I meant is that maybe we don't have to worry about user-defined functions, and we can use signals and interruptions. Embedders, if needed, can handle resource cleanup within the process after the runtime finalizes.

it's certainly possible to code host functions that way.

however, it isn't how we code host functions right now. (including the ones within wamr tree like wasi)
so, most of host functions need to be adapted anyway.
also, my honest impression is that the approach is too difficult for average programmers around me.

in other words, i prefer "if a host function doesn't implement async termination properly, it can't be async terminated (eg. possibly block forever)" over "if a host function doesn't implement async termination properly, it might cause problems like resource leak on async termination."

@loganek
Copy link
Collaborator

loganek commented Feb 1, 2023

In that case I guess this could be a compilation/runtime flag. We worked in parallel on making the wait instruction interruptable: https://github.com/bytecodealliance/wasm-micro-runtime/pull/1929/files

@yamt
Copy link
Collaborator

yamt commented Feb 3, 2023

In that case I guess this could be a compilation/runtime flag. We worked in parallel on making the wait instruction interruptable: https://github.com/bytecodealliance/wasm-micro-runtime/pull/1929/files

i guess per host function flag makes more sense than a compilation/runtime flag.

@loganek
Copy link
Collaborator

loganek commented Feb 3, 2023

i guess per host function flag makes more sense than a compilation/runtime flag.

What do you mean by host function flag?

@yamt
Copy link
Collaborator

yamt commented Feb 6, 2023

i guess per host function flag makes more sense than a compilation/runtime flag.

What do you mean by host function flag?

eg. add a member to WASMFunctionImport to specify if it's safe to terminate the function that way.

@loganek
Copy link
Collaborator

loganek commented Feb 7, 2023

From #1869 (comment):

for another runtime i made all i/o non-blocking to handle thread termination requests. but i don't think it's an appropriate approach for WAMR.

I wonder if that's something we could actually implement that at least for the poll_oneof() syscall - that function is being used for things like sleep in wasi-libc so even though not all blocking operations are covered, we can already satisfy a lot of usecases for platforms where signal-based implementation is not available. @yamt @wenyongh I'd like to know your thoughts and concerns about that.

@yamt
Copy link
Collaborator

yamt commented Feb 7, 2023

From #1869 (comment):

for another runtime i made all i/o non-blocking to handle thread termination requests. but i don't think it's an appropriate approach for WAMR.

I wonder if that's something we could actually implement that at least for the poll_oneof() syscall - that function is being used for things like sleep in wasi-libc so even though not all blocking operations are covered, we can already satisfy a lot of usecases for platforms where signal-based implementation is not available. @yamt @wenyongh I'd like to know your thoughts and concerns about that.

do you mean to make these sleep-like functionalities wake up periodically internally so that it can check the extra termination conditions?

i suppose it's the only choice for platforms where signal-like functionalities are not available. (that's why i did it that way for another runtime, where one of its main target doesn't have signals.)

@loganek
Copy link
Collaborator

loganek commented Feb 7, 2023

do you mean to make these sleep-like functionalities wake up periodically internally so that it can check the extra termination conditions?

Yes, I was a bit confused when you said it's not approperiate approach for WAMR, but looks like it was just a misunderstanding, and we're on the same page. We'll implement that in addition to signals then.

@wenyongh
Copy link
Contributor

wenyongh commented Feb 8, 2023

From #1869 (comment):

for another runtime i made all i/o non-blocking to handle thread termination requests. but i don't think it's an appropriate approach for WAMR.

I wonder if that's something we could actually implement that at least for the poll_oneof() syscall - that function is being used for things like sleep in wasi-libc so even though not all blocking operations are covered, we can already satisfy a lot of usecases for platforms where signal-based implementation is not available. @yamt @wenyongh I'd like to know your thoughts and concerns about that.

do you mean to make these sleep-like functionalities wake up periodically internally so that it can check the extra termination conditions?

i suppose it's the only choice for platforms where signal-like functionalities are not available. (that's why i did it that way for another runtime, where one of its main target doesn't have signals.)

Sounds reasonable, for libc-wasi and internal code, we might change the implementation.

But for user developed native library, it makes me a little confused: should runtime interrupt the thread running into native API by signal-like functionality, will it be not so friendly to developer? Should the developer be responsible for his behaviors?
Or we just make this an option, well document it, and let developer choose whether to enable it or not?

@yamt
Copy link
Collaborator

yamt commented Feb 8, 2023

do you mean to make these sleep-like functionalities wake up periodically internally so that it can check the extra termination conditions?

Yes, I was a bit confused when you said it's not approperiate approach for WAMR, but looks like it was just a misunderstanding, and we're on the same page. We'll implement that in addition to signals then.

i said so about making "all i/o non-blocking".
but what you are talking about is something less extreme, right?

@yamt
Copy link
Collaborator

yamt commented Feb 8, 2023

From #1869 (comment):

for another runtime i made all i/o non-blocking to handle thread termination requests. but i don't think it's an appropriate approach for WAMR.

I wonder if that's something we could actually implement that at least for the poll_oneof() syscall - that function is being used for things like sleep in wasi-libc so even though not all blocking operations are covered, we can already satisfy a lot of usecases for platforms where signal-based implementation is not available. @yamt @wenyongh I'd like to know your thoughts and concerns about that.

do you mean to make these sleep-like functionalities wake up periodically internally so that it can check the extra termination conditions?
i suppose it's the only choice for platforms where signal-like functionalities are not available. (that's why i did it that way for another runtime, where one of its main target doesn't have signals.)

Sounds reasonable, for libc-wasi and internal code, we might change the implementation.

But for user developed native library, it makes me a little confused: should runtime interrupt the thread running into native API by signal-like functionality, will it be not so friendly to developer? Should the developer be responsible for his behaviors? Or we just make this an option, well document it, and let developer choose whether to enable it or not?

it's reasonable to make it an option.

  • a developer can make his functions signal-termination-safe and mark it so in the corresponding NativeSymbols. the runtime can terminate them with a signal where it's available.
  • besides that, a developer can implement graceful termination mechanism like periodic checks in his functions. we can provide an api for this. (eg. should_terminate())
  • otherwise, a function can't be terminated. (the default behavior)

@loganek
Copy link
Collaborator

loganek commented Feb 8, 2023

but what you are talking about is something less extreme, right?

Yes, I'm only talking about the poll_oneof function (at least for now)

Or we just make this an option, well document it, and let developer choose whether to enable it or not?

Yeah, I think we discussed it a few comments above; make it optional sounds reasonable to me.

@yamt
Copy link
Collaborator

yamt commented Jul 6, 2023

is anyone still working on this?

@eloparco
Copy link
Contributor Author

eloparco commented Jul 6, 2023

is anyone still working on this?

Not currently, on my side at least. We started the effort on this branch bytecodealliance:dev/interrupt_block_insn.
So, at the moment, the new tests from the proposal wouldn't pass.

@yamt
Copy link
Collaborator

yamt commented Jul 7, 2023

is anyone still working on this?

Not currently, on my side at least. We started the effort on this branch bytecodealliance:dev/interrupt_block_insn. So, at the moment, the new tests from the proposal wouldn't pass.

ok. thank you for an update.
are you (or someone) still willing/planning to work on this?

@eloparco
Copy link
Contributor Author

eloparco commented Jul 7, 2023

are you (or someone) still willing/planning to work on this?

Unfortunately, not in the short term, I don't have any time scheduled to work on it.
Feel free to pick it up if you want.

wenyongh pushed a commit that referenced this issue Sep 20, 2023
Send a signal whose handler is no-op to a blocking thread to wake up
the blocking syscall with either EINTR equivalent or partial success.

Unlike the approach taken in the `dev/interrupt_block_insn` branch (that is,
signal + longjmp similarly to `OS_ENABLE_HW_BOUND_CHECK`), this PR
does not use longjmp because:
* longjmp from signal handler doesn't work on nuttx
  refer to apache/nuttx#10326
* the singal+longjmp approach may be too difficult for average programmers
  who might implement host functions to deal with

See also #1910
victoryang00 pushed a commit to victoryang00/wamr-aot-gc-checkpoint-restore that referenced this issue May 27, 2024
Send a signal whose handler is no-op to a blocking thread to wake up
the blocking syscall with either EINTR equivalent or partial success.

Unlike the approach taken in the `dev/interrupt_block_insn` branch (that is,
signal + longjmp similarly to `OS_ENABLE_HW_BOUND_CHECK`), this PR
does not use longjmp because:
* longjmp from signal handler doesn't work on nuttx
  refer to apache/nuttx#10326
* the singal+longjmp approach may be too difficult for average programmers
  who might implement host functions to deal with

See also bytecodealliance#1910
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants