runtime: unexpected return pc for runtime.sigpanic called from 0x1 #43496

egonelbre · 2021-01-04T13:11:05Z

What version of Go are you using (`go version`)?

$ go version
go version go1.15.6 linux/amd64

Does this issue reproduce with the latest release?

Unable to reproduce.

What operating system and processor architecture are you using (`go env`)?

It's using golang:1.15.6 docker image as the base for CI testing.

What did you do?

We have automated tests running for changes and one of them failed with a crash.
The race-detector is enabled for the tests.

The relevant part of the crash seems to be:

unexpected fault address 0x20000e25d000
fatal error: fault
runtime: unexpected return pc for runtime.sigpanic called from 0x1
stack: frame={sp:0x7fff71c22dd0, fp:0x7fff71c22e00} stack=[0xc0016e4000,0xc0016e8000)

fatal error: unknown caller pc

runtime stack:
runtime.throw(0x21593f2, 0x11)
	/usr/local/go/src/runtime/panic.go:1116 +0x72
runtime.gentraceback(0x4672b3, 0x7fff71c22dd0, 0x0, 0xc00059af00, 0x0, 0x0, 0x7fffffff, 0x7fff71c22ed8, 0x0, 0x0, ...)
	/usr/local/go/src/runtime/traceback.go:273 +0x1aec
runtime.addOneOpenDeferFrame.func1()
	/usr/local/go/src/runtime/panic.go:721 +0x91
runtime.systemstack(0x487db4)
	/usr/local/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
	/usr/local/go/src/runtime/proc.go:1116


goroutine 8278 [running]:
runtime: unexpected return pc for runtime.systemstack_switch called from 0x0
stack: frame={sp:0x7fff71c22cc0, fp:0x7fff71c22cc8} stack=[0xc0016e4000,0xc0016e8000)

runtime.systemstack_switch()
	/usr/local/go/src/runtime/asm_amd64.s:330 fp=0x7fff71c22cc8 sp=0x7fff71c22cc0 pc=0x487ee0
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:54 +0x74


goroutine 1 [chan receive]:
testing.(*T).Run(0xc000500600, 0x214ec46, 0x9, 0x223ef60, 0x0)
	/usr/local/go/src/testing/testing.go:1169 +0x5f4
testing.runTests.func1(0xc000500600)
	/usr/local/go/src/testing/testing.go:1439 +0xa7
testing.tRunner(0xc000500600, 0xc000121ce0)
	/usr/local/go/src/testing/testing.go:1123 +0x203
testing.runTests(0xc0001b6d80, 0x2daffe0, 0xc, 0xc, 0xbff39cd96b1b208f, 0x1179127ca7c, 0x2e3c840, 0xc0004219b0)
	/usr/local/go/src/testing/testing.go:1437 +0x613
testing.(*M).Run(0xc0001ac300, 0x0)
	/usr/local/go/src/testing/testing.go:1345 +0x3b4
main.main()
	_testmain.go:67 +0x237

...

The text was updated successfully, but these errors were encountered:

egonelbre · 2021-01-04T13:12:02Z

Maybe related #41099

egonelbre · 2021-01-04T15:40:52Z

Going through the stack trace multiple times, the additional thing that stuck out was:

goroutine 8315 [runnable]:
golang.org/x/crypto/argon2.processBlockSSE(0xc002c99800, 0xc002c99400, 0xc00382cc00, 0x1)
	/go/pkg/mod/golang.org/x/[email protected]/argon2/blamka_amd64.go:24 +0x62b
golang.org/x/crypto/argon2.processBlockXOR(...)
	/go/pkg/mod/golang.org/x/[email protected]/argon2/blamka_amd64.go:59
golang.org/x/crypto/argon2.processBlocks.func1(0x300000000, 0xc000000000, 0xc000326000)
	/go/pkg/mod/golang.org/x/[email protected]/argon2/argon2.go:223 +0x21b
created by golang.org/x/crypto/argon2.processBlocks
	/go/pkg/mod/golang.org/x/[email protected]/argon2/argon2.go:234 +0x1bd

And maybe:

goroutine 8530 [runnable]:
strings.Index(0x287dad8, 0x2a, 0x2147016, 0x1, 0x2)
	/usr/local/go/src/strings/strings.go:1024 +0x825
strings.genSplit(0x287dad8, 0x2a, 0x2147016, 0x1, 0x0, 0x2, 0x36, 0x14d, 0xbcc060)
	/usr/local/go/src/strings/strings.go:251 +0x131
strings.Split(...)
	/usr/local/go/src/strings/strings.go:299

egonelbre · 2021-01-04T15:57:51Z

My best guess is that a bad BP in asm code can cause traceback to crash. Based on seeing something similar in minio/sha256-simd#55

toothrot · 2021-01-05T19:19:04Z

/cc aclements @randall77 @mknyszek @prattmic

randall77 · 2021-01-05T20:26:40Z

Traceback doesn't use BP currently, so that isn't it. (Using BP for traceback is #16638.)

That minio bug involves playing with SP, not BP, which could cause problems. (Our traceback code can't handle variable-sized frames.)

goroutine 8278 does look like frames are missing as it starts with a systemstack_switch, which doesn't seem right. The first frame should be a errgroup.(*Group).Go.func1 frame, I think.
Possibly defer related? @danscales

danscales · 2021-01-05T21:29:40Z

Since sighandler/preparePanic do some manipulation of the stack to make it look like there was a direct call to sigpanic(), is it possible that this is some extremely rare race in the sighander or preparePanic?

From the list of stack traces, we can't definitively pinpoint where the original panic happened. But during panic handling, open-coded defers do need to look at the stack trace of the panic goroutine, hence will cause a problem if the backtrace has an issue.

For reference, here is the source code for errgroup.(*Group).Go(). It seems possible that goroutine 8278 (the one that started at line 54 below) is the one that panicked and has a corrupted stack and hence cause secondary panic in the addOneOpenDeferFrame/gentraceback. It's not clear to me if the original panic or problem is defer-related, or just that the defer-processing is highlighting the bad backtrace.

func (g *Group) Go(f func() error) {
	g.wg.Add(1)

	go func() {  // line 54
		defer g.wg.Done()

		if err := f(); err != nil {
			g.errOnce.Do(func() {
				g.err = err
				if g.cancel != nil {
					g.cancel()
				}
			})
		}
	}()
}

egonelbre · 2021-01-06T08:43:16Z

That minio bug involves playing with SP.

Sidenote, is or should there be a linter that verifies that SP is not being used?

randall77 · 2021-01-06T15:55:43Z

@egonelbre Yes, that would make sense. We could also be more conservative with tracebacks during profiling interrupts, as that's the only case where we would try to traceback assembly like that.

bcmills · 2021-08-10T15:32:28Z

Is this possibly the same cause as #43942?

seebs · 2021-10-15T20:18:31Z

I have what appears at the moment to be a reproducer for this, sort of, but it's not an isolated reproducer, and also it's extremely unstable. MacOS amd64 host, running go test -race on a fairly large code base.

Symptoms: First, all of yesterday, any attempt to run go test -race just failed out with a complaint about too many address collisions for the race detector. These failures happened essentially-instantly at test startup. I ignored this and went and did something else for a bit, changing nothing that would even have been hit that early in test startup, and started getting cryptic complaints about SIGILL showing instruction bytes which were faultfcntlfetch[...] and looked like maybe part of a string table or something. That was 1.16.7. So I tried 1.17.2, and got:

unexpected fault address 0x2304d0000040
fatal error: fault
runtime: unexpected return pc for runtime.sigpanic called from 0x0
stack: frame={sp:0x70000a1b9e20, fp:0x70000a1b9e70} stack=[0xc00d24f000,0xc00d250000)

fatal error: unknown caller pc

runtime stack:
runtime.throw({0x59ceab8, 0x6757b40})
        /usr/local/go/src/runtime/panic.go:1198 +0x71
runtime.gentraceback(0x1, 0x1, 0xc0006376c0, 0x70000a1b9ef0, 0x0, 0x0, 0x7fffffff, 0x70000a1b9f08, 0x59bb535, 0x0)
        /usr/local/go/src/runtime/traceback.go:274 +0x1956
runtime.addOneOpenDeferFrame.func1()
        /usr/local/go/src/runtime/panic.go:751 +0x6b
runtime.systemstack()
        /usr/local/go/src/runtime/asm_amd64.s:383 +0x49

goroutine 12918 [running]:
runtime: unexpected return pc for runtime.systemstack_switch called from 0x0
stack: frame={sp:0x70000a1b9d18, fp:0x70000a1b9d20} stack=[0xc00d24f000,0xc00d250000)

runtime.systemstack_switch()
        /usr/local/go/src/runtime/asm_amd64.s:350 fp=0x70000a1b9d20 sp=0x70000a1b9d18 pc=0x406d840
created by golang.org/x/sync/errgroup.(*Group).Go
        /Users/seebs/go/src/github.com/molecula/featurebase/vendor/golang.org/x/sync/errgroup/errgroup.go:54 +0xf1

I don't know exactly what's happening, but I do know that this appears to happen at the same point in the test run every time (the previous message is the same, but it's somewhere during a test which might take quite a few seconds, so I don't know). The specific test it happens in completes without issue, though, if I run it alone; it's only with the other tests before it that it blows up. Also, there's about 1k goroutines (this is perhaps not surprising for the tests in question). In one of the dumps, four goroutines were all in the same function, makeString, which is not really identical to the following, but basically like this:

func makeBytes(unused, n1, n2 string, ignored uint64) []byte {
    r := make([]byte, 0, 16)
    r = append(r, '~')
    r = append(r, []byte(n1)...)
    r = append(r, ';')
    r = append(r, []byte(n2)...)
    r = append(r, '<')
    return r
}

func makeString(unused, n1, n2 string, ignored uint64) string {
    return string(makeBytes(n1, n2))
}

These calls are occurring at roughly the same point in several goroutines, but there's nothing about these that makes me think they should be taking over a microsecond or so, and yet, sometimes I see several goroutines stuck in this, or on a line which looks like otherFunc(makeString(a, b, c, d), e), which makes me suspicious that it's trying to do something here which is failing weirdly.

I don't know what's going on but it's definitely something.

farhaven · 2022-04-08T07:49:46Z

Do you still have the reproducer? If so, does the issue still appear when building with a Go release >= 1.17.4? If not, this might have been fixed by #49729.

toothrot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 5, 2021

toothrot added this to the Backlog milestone Jan 5, 2021

egonelbre mentioned this issue Jan 6, 2021

Panic when profiling AVX2 implementation minio/sha256-simd#55

Closed

ALTree mentioned this issue Jan 24, 2021

runtime: unexpected return pc with go1.15 and go1.14 but not go1.13 #43882

Closed

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022

prattmic added this to Go Compiler / Runtime Jul 25, 2022

prattmic moved this to Triage Backlog in Go Compiler / Runtime Jul 25, 2022

rsc mentioned this issue Aug 5, 2022

runtime: unexpected return pc crash on linux-amd64-alpine builder #54306

Closed

nicktrav mentioned this issue Apr 28, 2023

github.com/cockroachdb/pebble/internal/metamorphic: TestMeta failed cockroachdb/pebble#2507

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: unexpected return pc for runtime.sigpanic called from 0x1 #43496

runtime: unexpected return pc for runtime.sigpanic called from 0x1 #43496

egonelbre commented Jan 4, 2021

egonelbre commented Jan 4, 2021

egonelbre commented Jan 4, 2021 •

edited

Loading

egonelbre commented Jan 4, 2021

toothrot commented Jan 5, 2021

randall77 commented Jan 5, 2021

danscales commented Jan 5, 2021

egonelbre commented Jan 6, 2021

randall77 commented Jan 6, 2021

bcmills commented Aug 10, 2021

seebs commented Oct 15, 2021

farhaven commented Apr 8, 2022

runtime: unexpected return pc for runtime.sigpanic called from 0x1 #43496

runtime: unexpected return pc for runtime.sigpanic called from 0x1 #43496

Comments

egonelbre commented Jan 4, 2021

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

egonelbre commented Jan 4, 2021

egonelbre commented Jan 4, 2021 • edited Loading

egonelbre commented Jan 4, 2021

toothrot commented Jan 5, 2021

randall77 commented Jan 5, 2021

danscales commented Jan 5, 2021

egonelbre commented Jan 6, 2021

randall77 commented Jan 6, 2021

bcmills commented Aug 10, 2021

seebs commented Oct 15, 2021

farhaven commented Apr 8, 2022

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

egonelbre commented Jan 4, 2021 •

edited

Loading