golang-pie-httpserver crashes on control-C #1026

nyh · 2019-02-24T15:16:39Z

This is a spinoff from issue #352.

If we run

$ scripts/build image=golang-pie-httpserver
$ scripts/run.py

This seems to work correctly despite issue #352. At least no obvious problem.
But there is a crash if we click on control-C to kill the server:

page fault outside application, addr: 0x0000000000000000
[registers]
RIP: 0x000000000039a462 <syscall_entry+18>
RFL: 0x0000000000010286  CS:  0x0000000000000008  SS:  0x0000000000000010
RAX: 0x000000000000000e  RBX: 0x000000c00002e000  RCX: 0x00001000001f6eda  RDX: 0x000000c00002e080
RSI: 0x0000000000000000  RDI: 0x0000000000000002  RBP: 0xffff8000032eee70  R8:  0xffff800000a24501
R9:  0xffff800000a24530  R10: 0x0000000000000008  R11: 0x0000000000000286  R12: 0xffff800001d43040
R13: 0xffff800000a24530  R14: 0x0000000000000008  R15: 0x0000000000000286  RSP: 0x0000000000000000
Aborted

[backtrace]
0x000000000033fbc8 <???+3406792>
0x00000000003410aa <mmu::vm_fault(unsigned long, exception_frame*)+458>
0x000000000039a625 <page_fault+117>
0x00000000003994a6 <???+3773606>
0x00001000001dde94 <???+1957524>
0x00001000001ce031 <???+1892401>
0x00001000001ddc56 <???+1956950>
0x00001000001dd545 <???+1955141>
0x00001000001f6fc2 <???+2060226>
0x00000000003f3116 <thread_main_c+38>
0x000000000039a422 <???+3777570>

Unfortunately gdb gives us this:

#0  0x00000000003a0572 in processor::cli_hlt () at arch/x64/processor.hh:247
#1  arch::halt_no_interrupts () at arch/x64/arch.hh:48
#2  osv::halt () at arch/x64/power.cc:24
#3  0x000000000023ebae in abort (fmt=fmt@entry=0x622b33 "Aborted\n")
    at runtime.cc:132
#4  0x000000000020281e in abort () at runtime.cc:98
#5  0x000000000033fbc9 in mmu::vm_sigsegv (ef=0xffff8000032de068, 
    addr=<optimized out>) at core/mmu.cc:1316
#6  mmu::vm_sigsegv (addr=<optimized out>, ef=0xffff8000032de068)
    at core/mmu.cc:1310
#7  0x00000000003410ab in mmu::vm_fault (addr=addr@entry=0, 
    ef=ef@entry=0xffff8000032de068) at core/mmu.cc:1338
#8  0x000000000039a626 in page_fault (ef=0xffff8000032de068)
    at arch/x64/mmu.cc:38
#9  <signal handler called>
../../gdb/dwarf2-frame.c:1061: internal-error: Unknown CFA rule.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n)

I suspect the second problem comes from bad Dwarf CFA instructions we added for the SYSCALL instruction and didn't debug thoroughly well enough.

As for the crash itself, my guess is that maybe a SYSCALL inside a signal handler doesn't work. Did we ever test that? The crashing instruction inside syscall_entry is:

  39a459:       64 48 8b 24 25 10 00    mov    %fs:0x10,%rsp
  39a460:       00 00 
  39a462:       48 83 3c 24 00          cmpq   $0x0,(%rsp)

The syscall_entry code seems to have that tiny/large syscall stack trickery, and maybe it somehow breaks inside a signal handler? Maybe the PIE has something to do with this problem?

We can try a signal handler in Go which does not involve a shutdown, to at least get shutdown issues out of the equation.

The text was updated successfully, but these errors were encountered:

wkozaczuk · 2019-02-26T23:10:16Z

@nyh What do you mean by "signal handler in Go which does not involve a shutdown"?

nyh · 2019-02-27T09:42:07Z

I was assuming (but it's worth verifying too...) that the Go application captures that SIGINT, and then this case involves two things: a signal handler (written in Go?) and a shutdown from that signal handler. I don't know which of the two things causes this crash. It could be a crash caused by incorrect handling of system calls (e.g., exit()) inside a signal handler. It could be a crash caused by exit() in parallel with other threads running other system calls. Or it can be other things... In short, this needs debugging....

wkozaczuk · 2019-07-01T04:17:11Z

Here is is the crux of the problem: it looks Golang sets up a signal handler that gets run on OSv system thread (not an app thread) which obviously has no syscall stack setup (see RSP with 0 value as a result of it). That is why we get page fault accessing a value on non-existent stack.

It looks like this patch fixes it:

diff --git a/arch/x64/entry.S b/arch/x64/entry.S
index 0c7164e0..5a64c7dd 100644
--- a/arch/x64/entry.S
+++ b/arch/x64/entry.S
@@ -174,6 +174,10 @@ syscall_entry:
     # on syscall stack pointed by address in TCB (%fs:16) - double memory dereference.
     # Therefore we are forced to save caller stack address in a field in TCB.
     movq %rsp, %fs:24 # syscall_caller_stack_pointer
+
+    cmpq $0, %fs:16
+    jz large_stack_has_been_setup
+
     #
     # Switch stack to "tiny" syscall stack that should be large
     # enough to setup "large" syscall stack (only when first SYSCALL on this thread)

Now when I do ctrl-C I get this:

OSv v0.53.0-44-g06ce8e00
eth0: 192.168.122.15
Booted up in 138.36 ms
WARNING: /httpserver.so is a PIE using TLS. This is currently unsupported (see issue #352). Link with '-shared' instead of '-pie'.
Go version: go1.12.6
syscall(): unimplemented system call 39
syscall(): unimplemented system call 234

The 39 syscall (getpid) is easy to fix. But what about 234 (sys_tgkill) - how shall we handle it?

nyh · 2019-07-01T08:31:20Z

Good catch. Your patch seems reasonable (if there is no "tiny syscall stack", don't switch stacks at all, hoping this thread has a big enough stack already). There's perhaps another option we can consider, which is to make the thread running the signal handler (see libc/signal.cc, sched::thread::make() call in kill()) be tagged an application thread, and get the normal syscall stack handling.

About tgkill(), I don't know what this is trying to do (send a signal to a specific thread? a thread group? what is this signal supposed to do?) so I don't know what is the minimal support we can add to make sense...

nyh · 2019-07-03T10:00:35Z

@wkozaczuk If I understand correctly, the crash originally reported here was fixed by commit 33bea60, right? If so, what are the symptoms now?

wkozaczuk · 2019-07-03T15:40:32Z

Yeah, I was a bit hesitant to close this issue as pressing Carl-C still does not make this application exit same way it happens on Linux. Or other hand that patch fixes the exact crash. So maybe we should close this one and open new clean one about tgkill. What do you think ? The symptom now is that every time user presses Ctrl-C he gets a message about unimplemented tgkill syscall and the app stays up.

…

Sent from my iPhone

On Jul 3, 2019, at 06:00, nyh ***@***.***> wrote: @wkozaczuk If I understand correctly, the crash originally reported here was fixed by commit 33bea60, right? If so, what are the symptoms now? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

wkozaczuk · 2019-07-12T19:31:14Z

Closing this issue as the commit 33bea60 technically fixed this original issue. The related issue #1047 has been opened instead, to track new different problem exposed by fixing this original issue.

wkozaczuk mentioned this issue Jul 12, 2019

Provide limited implementation of the tgkill syscall needed by Golang PIE apps #1047

Closed

wkozaczuk closed this as completed Jul 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

golang-pie-httpserver crashes on control-C #1026

golang-pie-httpserver crashes on control-C #1026

nyh commented Feb 24, 2019

wkozaczuk commented Feb 26, 2019

nyh commented Feb 27, 2019

wkozaczuk commented Jul 1, 2019

nyh commented Jul 1, 2019

nyh commented Jul 3, 2019

wkozaczuk commented Jul 3, 2019 via email

wkozaczuk commented Jul 12, 2019

golang-pie-httpserver crashes on control-C #1026

golang-pie-httpserver crashes on control-C #1026

Comments

nyh commented Feb 24, 2019

wkozaczuk commented Feb 26, 2019

nyh commented Feb 27, 2019

wkozaczuk commented Jul 1, 2019

nyh commented Jul 1, 2019

nyh commented Jul 3, 2019

wkozaczuk commented Jul 3, 2019 via email

wkozaczuk commented Jul 12, 2019