Stack overflow should abort the process normally, not segfault and dump core #31273

brson · 2016-01-29T01:42:24Z

Today, when you overflow the stack, Rust traps the SIGSEGV and prints an error message, then the program exits with a segfault, exit code 135 on linux, and dumps core.

When we used segstacks for stack overflow the runtime just abort!ed, exiting with an illegal instruction.

That stack overflow protection is implemented by handling a segfault is an implementation detail. Rust should not be exposing it. Instead, when it detects a segfault because of stack overflow it should abort like any other fatal error.

cc https://users.rust-lang.org/t/rust-guarantees-no-segfaults-with-only-safe-code-but-it-segfaults-stack-overflow/4305/20

The text was updated successfully, but these errors were encountered:

jethrogb · 2016-01-29T07:06:02Z

Considering Rust already traps the SIGSEGV, modifying the signal handler to exit at that point should be easy.

nagisa · 2016-01-29T12:14:42Z

SIGSEGV is the default behaviour on stack overflow in C/C++ programs compiled both by gcc and clang.

I’m not sure SIGILL is any better end-result compared to SIGSEGV. They both will dump the core and they both are pretty opaque. Fixing this issue will only serve to make “prevents segfaults” (should be changed to something else regardless of the outcome here) on the front page a little bit true-r.

lambda · 2016-02-01T00:02:28Z

I think that SIGABRT is fairly reasonable for a stack overflow; it's still treated as a crash, so will dump core or send crash reports on most platforms, but it doesn't cause the confusion that a SIGSEGV would cause by making people think that they need to track down some piece of unsafe code that's chasing a dangling pointer.

I've sent a PR that implements this.

lambda · 2016-02-01T00:17:23Z

This ticket is related to #30963 (about updating the documentation to mention stack overflow segfaults, which would not be necessary if this issue is fixed) and #26458 (which contains discussion of the most recent change to the current stack overflow behavior).

nagisa · 2016-02-01T00:27:10Z

@lambda there’s no (portable) way to produce SIGABRT in a no-stack condition, I think. SIGSEGV, SIGILL and SIGBUS, I think are an exhaustive set of signals you can make happen without stack.

lambda · 2016-02-01T00:51:56Z

@nagisa Do you have any reference for that? According to POSIX:

The following table defines a set of functions that shall be either reentrant or non-interruptible by signals and shall be async-signal-safe. Therefore applications may invoke them, without restriction, from signal-catching functions:

...

abort()

...

raise()

Remember, when we're in our SIGBUS/SIGSEGV handler, we're operating on a small alternative stack, so we actually do have enough stack space to run our signal handler and print out an error message. I believe that should be sufficient for raising SIGABRT as well.

nagisa · 2016-02-01T00:58:41Z

we're operating on a small alternative stack

Yep, I wrote the comment under the assumption we’re restricted to not using any stack, but I had totally forgotten we have signal specific stack for SIGSEGV handler.

We use guard pages that cause the process to abort to protect against undefined behavior in the event of stack overflow. We have a handler that catches segfaults, prints out an error message if the segfault was due to a stack overflow, then unregisters itself and returns to allow the signal to be re-raised and kill the process. This caused some confusion, as it was unexpected that safe code would be able to cause a segfault, while it's easy to overflow the stack in safe code. To avoid this confusion, when we detect a segfault in the guard page, abort instead of the previous behavior of re-raising the SIGSEGV. To test this, we need to adapt the tests for segfault to actually check the exit status. Doing so revealed that the existing test for segfault behavior was actually invalid; LLVM optimizes the explicit null pointer reference down to an illegal instruction, so the program aborts with SIGILL instead of SIGSEGV and the test didn't actually trigger the signal handler at all. Use a C helper function to get a null pointer that LLVM can't optimize away, so we get our segfault instead. This is a [breaking-change] if anyone is relying on the exact signal raised to kill a process on stack overflow. Closes rust-lang#31273

Abort on stack overflow instead of re-raising SIGSEGV We use guard pages that cause the process to abort to protect against undefined behavior in the event of stack overflow. We have a handler that catches segfaults, prints out an error message if the segfault was due to a stack overflow, then unregisters itself and returns to allow the signal to be re-raised and kill the process. This caused some confusion, as it was unexpected that safe code would be able to cause a segfault, while it's easy to overflow the stack in safe code. To avoid this confusion, when we detect a segfault in the guard page, abort instead of the previous behavior of re-raising SIGSEGV. To test this, we need to adapt the tests for segfault to actually check the exit status. Doing so revealed that the existing test for segfault behavior was actually invalid; LLVM optimizes the explicit null pointer reference down to an illegal instruction, so the program aborts with SIGILL instead of SIGSEGV and the test didn't actually trigger the signal handler at all. Use a C helper function to get a null pointer that LLVM can't optimize away, so we get our segfault instead. This is a [breaking-change] if anyone is relying on the exact signal raised to kill a process on stack overflow. Closes #31273

intrinsics::abort compiles down to an illegal instruction, which on Unix-like platforms causes the process to be killed with SIGILL. A more appropriate way to kill the process would be SIGABRT; this indicates better that the runtime has explicitly aborted, rather than some kind of compiler bug or architecture mismatch that SIGILL might indicate. For rtassert!, replace this with libc::abort. libc::abort raises SIGABRT, but is defined to do so in such a way that it will terminate the process even if SIGABRT is currently masked or caught by a signal handler that returns. On non-Unix platforms, retain the existing behavior. On Windows we prefer to avoid depending on the C runtime, and we need a fallback for any other platforms that may be defined. An alternative on Windows would be to call TerminateProcess, but this seems less essential than switching to using SIGABRT on Unix-like platforms, where it is common for the process-killing signal to be printed out or logged. This is a [breaking-change] for any code that depends on the exact signal raised to abort a process via rtabort! cc rust-lang#31273 cc rust-lang#31333

Use libc::abort, not intrinsics::abort, in rtabort! intrinsics::abort compiles down to an illegal instruction, which on Unix-like platforms causes the process to be killed with SIGILL. A more appropriate way to kill the process would be SIGABRT; this indicates better that the runtime has explicitly aborted, rather than some kind of compiler bug or architecture mismatch that SIGILL might indicate. For rtassert!, replace this with libc::abort. libc::abort raises SIGABRT, but is defined to do so in such a way that it will terminate the process even if SIGABRT is currently masked or caught by a signal handler that returns. On non-Unix platforms, retain the existing behavior. On Windows we prefer to avoid depending on the C runtime, and we need a fallback for any other platforms that may be defined. An alternative on Windows would be to call TerminateProcess, but this seems less essential than switching to using SIGABRT on Unix-like platforms, where it is common for the process-killing signal to be printed out or logged. This is a [breaking-change] for any code that depends on the exact signal raised to abort a process via rtabort! cc #31273 cc #31333

brson added A-libs labels Jan 29, 2016

lambda mentioned this issue Jan 31, 2016

Abort on stack overflow instead of re-raising SIGSEGV #31333

Merged

critiqjo mentioned this issue Feb 4, 2016

Segfault due to stack size limit when recursing? zonyitoo/coio-rs#30

Closed

bors closed this as completed in #31333 Feb 6, 2016

lambda mentioned this issue Feb 6, 2016

Use libc::abort, not intrinsics::abort, in rtabort! #31457

Merged

mitaa mentioned this issue Feb 18, 2016

Process exited with signal 11 when mutable array of usize created #31748

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stack overflow should abort the process normally, not segfault and dump core #31273

Stack overflow should abort the process normally, not segfault and dump core #31273

brson commented Jan 29, 2016

jethrogb commented Jan 29, 2016

nagisa commented Jan 29, 2016

lambda commented Feb 1, 2016

lambda commented Feb 1, 2016

nagisa commented Feb 1, 2016

lambda commented Feb 1, 2016

nagisa commented Feb 1, 2016

Stack overflow should abort the process normally, not segfault and dump core #31273

Stack overflow should abort the process normally, not segfault and dump core #31273

Comments

brson commented Jan 29, 2016

jethrogb commented Jan 29, 2016

nagisa commented Jan 29, 2016

lambda commented Feb 1, 2016

lambda commented Feb 1, 2016

nagisa commented Feb 1, 2016

lambda commented Feb 1, 2016

nagisa commented Feb 1, 2016