Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/pprof: TestCPUProfileWithFork failures with stuck writerDescriptor goroutines #59995

Closed
gopherbot opened this issue May 5, 2023 · 19 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@gopherbot
Copy link
Contributor

gopherbot commented May 5, 2023

#!watchflakes
post <- goos == "darwin" && pkg == "runtime/pprof" && test == "TestCPUProfileWithFork" && `SIGQUIT: quit`

Issue created automatically to collect these failures.

Example (log):

SIGQUIT: quit
PC=0x7ff8061563ea m=1 sigcode=0

rax    0x104
rbx    0x70000a473000
rcx    0x70000a472c48
rdx    0x1a00
rdi    0xc000040380
rsi    0xa1c01000a1d00
rbp    0x70000a472cf0
...
	/tmp/buildlet/go/src/runtime/chan.go:107 +0xe5 fp=0xc0000b9a88 sp=0xc0000b9a48 pc=0x1055c85
os/exec.(*Cmd).Start(0xc00013e000)
	/tmp/buildlet/go/src/os/exec/exec.go:706 +0xa0f fp=0xc0000b9ce0 sp=0xc0000b9a88 pc=0x117696f
os/exec.(*Cmd).Run(0x0?)
	/tmp/buildlet/go/src/os/exec/exec.go:587 +0x27 fp=0xc0000b9d20 sp=0xc0000b9ce0 pc=0x1175ee7
os/exec.(*Cmd).CombinedOutput(0xc00013e000)
	/tmp/buildlet/go/src/os/exec/exec.go:1005 +0x1f7 fp=0xc0000b9d98 sp=0xc0000b9d20 pc=0x1178df7
runtime/pprof.TestCPUProfileWithFork(0xc000082820)
	/tmp/buildlet/go/src/runtime/pprof/pprof_test.go:687 +0x338 fp=0xc0000b9e98 sp=0xc0000b9d98 pc=0x12339b8
testing.tRunner(0xc000082820, 0x12b3970)

watchflakes

@gopherbot gopherbot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 5, 2023
@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime/pprof" && test == "TestCPUProfileWithFork"
2023-05-04 17:28 darwin-amd64-race go@60c724c5 runtime/pprof.TestCPUProfileWithFork (log)
SIGQUIT: quit
PC=0x7ff8061563ea m=1 sigcode=0

rax    0x104
rbx    0x70000a473000
rcx    0x70000a472c48
rdx    0x1a00
rdi    0xc000040380
rsi    0xa1c01000a1d00
rbp    0x70000a472cf0
...
	/tmp/buildlet/go/src/runtime/chan.go:107 +0xe5 fp=0xc0000b9a88 sp=0xc0000b9a48 pc=0x1055c85
os/exec.(*Cmd).Start(0xc00013e000)
	/tmp/buildlet/go/src/os/exec/exec.go:706 +0xa0f fp=0xc0000b9ce0 sp=0xc0000b9a88 pc=0x117696f
os/exec.(*Cmd).Run(0x0?)
	/tmp/buildlet/go/src/os/exec/exec.go:587 +0x27 fp=0xc0000b9d20 sp=0xc0000b9ce0 pc=0x1175ee7
os/exec.(*Cmd).CombinedOutput(0xc00013e000)
	/tmp/buildlet/go/src/os/exec/exec.go:1005 +0x1f7 fp=0xc0000b9d98 sp=0xc0000b9d20 pc=0x1178df7
runtime/pprof.TestCPUProfileWithFork(0xc000082820)
	/tmp/buildlet/go/src/runtime/pprof/pprof_test.go:687 +0x338 fp=0xc0000b9e98 sp=0xc0000b9d98 pc=0x12339b8
testing.tRunner(0xc000082820, 0x12b3970)

watchflakes

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label May 5, 2023
@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime/pprof" && test == "TestCPUProfileWithFork"
2023-05-10 14:30 darwin-amd64-race go@8e8303e1 runtime/pprof.TestCPUProfileWithFork (log)
SIGQUIT: quit
PC=0x7ff81b1ed34e m=1 sigcode=0

rax    0x4
rbx    0x700007427628
rcx    0x700007427568
rdx    0x0
rdi    0x3
rsi    0x0
rbp    0x700007427570
...
	/tmp/buildlet/go/src/os/exec/exec.go:941 +0x349 fp=0xc00007bc18 sp=0xc00007bad0 pc=0x117c7c9
os/exec.(*Cmd).Wait(0xc0000006e0)
	/tmp/buildlet/go/src/os/exec/exec.go:908 +0x2d5 fp=0xc00007bce0 sp=0xc00007bc18 pc=0x117c275
os/exec.(*Cmd).Run(0x0?)
	/tmp/buildlet/go/src/os/exec/exec.go:590 +0x55 fp=0xc00007bd20 sp=0xc00007bce0 pc=0x117a175
os/exec.(*Cmd).CombinedOutput(0xc0000006e0)
	/tmp/buildlet/go/src/os/exec/exec.go:1005 +0x1f7 fp=0xc00007bd98 sp=0xc00007bd20 pc=0x117d057
runtime/pprof.TestCPUProfileWithFork(0xc000144000)
	/tmp/buildlet/go/src/runtime/pprof/pprof_test.go:687 +0x338 fp=0xc00007be98 sp=0xc00007bd98 pc=0x123a538
testing.tRunner(0xc000144000, 0x12bf8a0)

watchflakes

@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime/pprof" && test == "TestCPUProfileWithFork"
2023-05-10 21:44 darwin-amd64-race go@3f9521b2 runtime/pprof.TestCPUProfileWithFork (log)
SIGQUIT: quit
PC=0x7ff807b492be m=1 sigcode=0

rax    0x4
rbx    0x0
rcx    0x70000f5f6da8
rdx    0x1
rdi    0x1403
rsi    0x0
rbp    0x70000f5f6de0
...
	/tmp/buildlet/go/src/os/exec/exec.go:941 +0x349 fp=0xc0000b7c18 sp=0xc0000b7ad0 pc=0x117c549
os/exec.(*Cmd).Wait(0xc000194000)
	/tmp/buildlet/go/src/os/exec/exec.go:908 +0x2d5 fp=0xc0000b7ce0 sp=0xc0000b7c18 pc=0x117bff5
os/exec.(*Cmd).Run(0xc0000e4160?)
	/tmp/buildlet/go/src/os/exec/exec.go:590 +0x55 fp=0xc0000b7d20 sp=0xc0000b7ce0 pc=0x1179ef5
os/exec.(*Cmd).CombinedOutput(0xc000194000)
	/tmp/buildlet/go/src/os/exec/exec.go:1005 +0x1f7 fp=0xc0000b7d98 sp=0xc0000b7d20 pc=0x117cdd7
runtime/pprof.TestCPUProfileWithFork(0xc000083860)
	/tmp/buildlet/go/src/runtime/pprof/pprof_test.go:687 +0x338 fp=0xc0000b7e98 sp=0xc0000b7d98 pc=0x123a2b8
testing.tRunner(0xc000083860, 0x12bf1a0)

watchflakes

@mknyszek
Copy link
Contributor

Darwin's fork is weird, so this might be related to that?

@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime/pprof" && test == "TestCPUProfileWithFork"
2023-05-23 17:16 darwin-amd64-race go@4e679e26 runtime/pprof.TestCPUProfileWithFork (log)
SIGQUIT: quit
PC=0x7ff81dd952be m=1 sigcode=0

rax    0x4
rbx    0x0
rcx    0x70000b58eda8
rdx    0x1
rdi    0x1403
rsi    0x0
rbp    0x70000b58ede0
...
	/tmp/buildlet/go/src/os/exec/exec.go:941 +0x349 fp=0xc0000b5c18 sp=0xc0000b5ad0 pc=0x117c4a9
os/exec.(*Cmd).Wait(0xc00005a2c0)
	/tmp/buildlet/go/src/os/exec/exec.go:908 +0x2da fp=0xc0000b5ce0 sp=0xc0000b5c18 pc=0x117bf5a
os/exec.(*Cmd).Run(0x0?)
	/tmp/buildlet/go/src/os/exec/exec.go:590 +0x55 fp=0xc0000b5d20 sp=0xc0000b5ce0 pc=0x1179e55
os/exec.(*Cmd).CombinedOutput(0xc00005a2c0)
	/tmp/buildlet/go/src/os/exec/exec.go:1005 +0x1f7 fp=0xc0000b5d98 sp=0xc0000b5d20 pc=0x117cd57
runtime/pprof.TestCPUProfileWithFork(0xc0000824e0)
	/tmp/buildlet/go/src/runtime/pprof/pprof_test.go:687 +0x33f fp=0xc0000b5e98 sp=0xc0000b5d98 pc=0x123ed9f
testing.tRunner(0xc0000824e0, 0x12bf0c8)

watchflakes

@bcmills bcmills changed the title runtime/pprof: TestCPUProfileWithFork failures runtime/pprof: TestCPUProfileWithFork failures with stuck writerDescriptor goroutines Jun 14, 2023
@mknyszek
Copy link
Contributor

These hangs seem possibly similar to #60365.

@mknyszek
Copy link
Contributor

Looking more closely at these stack traces, there are some really weird things. One of them makes it look like sysmon was blocked on a lock (sched.lock, I think) the whole time, while the other one seems to show sysmon blocked in netpoll, which it never should. Two of the failures have a goroutine running in a thread and we don't get a stack trace for it since we don't have GOTRACEBACK=crash enabled on the builders.

No luck in reproducing in a gomote so far.

@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime/pprof" && test == "TestCPUProfileWithFork" && `os/exec\.\(\*Cmd\)\.writerDescriptor\.func1`
2023-06-21 19:47 darwin-amd64-race go@b23cae80 runtime/pprof.TestCPUProfileWithFork (log)
SIGQUIT: quit
PC=0x7ff818f692be m=1 sigcode=0

rax    0x4
rbx    0x0
rcx    0x700009bb0da8
rdx    0x1
rdi    0x903
rsi    0x0
rbp    0x700009bb0de0
...
	/tmp/buildlet/go/src/os/exec/exec.go:941 +0x349 fp=0xc00007bc18 sp=0xc00007bad0 pc=0x117d309
os/exec.(*Cmd).Wait(0xc0001aa160)
	/tmp/buildlet/go/src/os/exec/exec.go:908 +0x2da fp=0xc00007bce0 sp=0xc00007bc18 pc=0x117cdba
os/exec.(*Cmd).Run(0x0?)
	/tmp/buildlet/go/src/os/exec/exec.go:590 +0x55 fp=0xc00007bd20 sp=0xc00007bce0 pc=0x117acb5
os/exec.(*Cmd).CombinedOutput(0xc0001aa160)
	/tmp/buildlet/go/src/os/exec/exec.go:1005 +0x1f7 fp=0xc00007bd98 sp=0xc00007bd20 pc=0x117dbb7
runtime/pprof.TestCPUProfileWithFork(0xc00018a680)
	/tmp/buildlet/go/src/runtime/pprof/pprof_test.go:687 +0x33f fp=0xc00007be98 sp=0xc00007bd98 pc=0x123ff3f
testing.tRunner(0xc00018a680, 0x12c0dd8)

watchflakes

@bcmills
Copy link
Contributor

bcmills commented Jun 30, 2023

These are all on darwin-amd64-race. I bet the writerDescriptor goroutines are stuck for the same reason as in #60108.

@bcmills
Copy link
Contributor

bcmills commented Jun 30, 2023

We could confirm or refute that with SIGQUIT stack traces. Should be easy enough with testenv.Command.

@gopherbot
Copy link
Contributor Author

Change https://go.dev/cl/507356 mentions this issue: runtime/pprof: use testenv.Command in tests instead of exec.Command

@gopherbot

This comment was marked as resolved.

@bcmills
Copy link
Contributor

bcmills commented Jun 30, 2023

Looking more closely at the test, the child process doesn't run StartCPUProfile at all.

I think the writerDescriptor goroutines are a red herring — maybe an artifact of some kind of Go scheduler deadlock.

@bcmills
Copy link
Contributor

bcmills commented Jun 30, 2023

To me this is looking like a dup of #60108, combined with some kind of scheduler deadlock that causes runnable I/O goroutines not to be scheduled.

gopherbot pushed a commit that referenced this issue Jul 20, 2023
If the test is about to time out, testenv.Command sends SIGQUIT to the
child process. The runtime's SIGQUIT goroutine dump should help us to
determine whether the hangs observed in TestCPUProfileWithFork are a
symptom of #60108 or a separate bug.

For #59995.
Updates #60108.

Change-Id: I26342ca262b2b0772795c8be142cfcad8d90db30
Reviewed-on: https://go-review.googlesource.com/c/go/+/507356
Run-TryBot: Bryan Mills <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Cherry Mui <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
Auto-Submit: Bryan Mills <[email protected]>
@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "darwin" && pkg == "runtime/pprof" && test == "TestCPUProfileWithFork"
2023-08-01 17:16 darwin-amd64-race go@b6898dde runtime/pprof.TestCPUProfileWithFork (log)
SIGQUIT: quit
PC=0x7ff8146fc2be m=1 sigcode=0

rax    0x4
rbx    0x0
rcx    0x700005d39da8
rdx    0x1
rdi    0x903
rsi    0x0
rbp    0x700005d39de0
...
	/tmp/buildlet/go/src/runtime/panic.go:442 +0x5f fp=0xc0000bda38 sp=0xc0000bd9a0 pc=0x108743f
os/exec.(*Cmd).Start(0xc000192000)
	/tmp/buildlet/go/src/os/exec/exec.go:718 +0xf99 fp=0xc0000bdca8 sp=0xc0000bda38 pc=0x1181bd9
os/exec.(*Cmd).Run(0x10c1b29?)
	/tmp/buildlet/go/src/os/exec/exec.go:587 +0x27 fp=0xc0000bdce8 sp=0xc0000bdca8 pc=0x1180bc7
os/exec.(*Cmd).CombinedOutput(0xc000192000)
	/tmp/buildlet/go/src/os/exec/exec.go:977 +0x1f7 fp=0xc0000bdd60 sp=0xc0000bdce8 pc=0x1183b37
runtime/pprof.TestCPUProfileWithFork(0xc000102680)
	/tmp/buildlet/go/src/runtime/pprof/pprof_test.go:691 +0x585 fp=0xc0000bdea0 sp=0xc0000bdd60 pc=0x1246c25
testing.tRunner(0xc000102680, 0x12c7e48)

watchflakes

@gopherbot gopherbot modified the milestones: Go1.21, Go1.22 Aug 8, 2023
@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "darwin" && pkg == "runtime/pprof" && test == "TestCPUProfileWithFork"
2023-07-19 21:06 darwin-amd64-race go@03c7e96b runtime/pprof.TestCPUProfileWithFork (log)
SIGQUIT: quit
PC=0x7ff8049112be m=1 sigcode=0

rax    0x4
rbx    0x0
rcx    0x7000033ccda8
rdx    0x1
rdi    0x1403
rsi    0x0
rbp    0x7000033ccde0
...
	/tmp/buildlet/go/src/os/exec/exec.go:941 +0x349 fp=0xc0000b7c18 sp=0xc0000b7ad0 pc=0x117cea9
os/exec.(*Cmd).Wait(0xc0001a0580)
	/tmp/buildlet/go/src/os/exec/exec.go:908 +0x2da fp=0xc0000b7ce0 sp=0xc0000b7c18 pc=0x117c95a
os/exec.(*Cmd).Run(0x0?)
	/tmp/buildlet/go/src/os/exec/exec.go:590 +0x55 fp=0xc0000b7d20 sp=0xc0000b7ce0 pc=0x117a855
os/exec.(*Cmd).CombinedOutput(0xc0001a0580)
	/tmp/buildlet/go/src/os/exec/exec.go:1005 +0x1f7 fp=0xc0000b7d98 sp=0xc0000b7d20 pc=0x117d757
runtime/pprof.TestCPUProfileWithFork(0xc000082b60)
	/tmp/buildlet/go/src/runtime/pprof/pprof_test.go:687 +0x33f fp=0xc0000b7e98 sp=0xc0000b7d98 pc=0x123fadf
testing.tRunner(0xc000082b60, 0x12c0958)

watchflakes

@gopherbot

This comment was marked as off-topic.

@bcmills
Copy link
Contributor

bcmills commented Aug 23, 2023

That last one is a different failure mode (#59679). Updated the watchflakes pattern to exclude it.

@mknyszek
Copy link
Contributor

As of the resolution of #61768, I think this is also fixed. gopherbot will reopen it if not true, but we haven't seen any issues in a couple weeks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Archived in project
Development

No branches or pull requests

4 participants