Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: TestPingPongHog failures with ratios out of range #52207

Closed
bcmills opened this issue Apr 7, 2022 · 10 comments
Closed

runtime: TestPingPongHog failures with ratios out of range #52207

bcmills opened this issue Apr 7, 2022 · 10 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker Soon This needs action soon. (recent regressions, service outages, unusual time-sensitive situations)
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Apr 7, 2022

--- FAIL: TestPingPongHog (0.10s)
    proc_test.go:486: want hogCount/lightCount in [0.2, 5]; got 24000000/133792000 = 0.1793829227457546
FAIL
FAIL	runtime	155.091s

See previously #38266, #20494.

greplogs --dashboard -md -l -e 'FAIL: TestPingPongHog .*(?:\n .*)*want hogCount/lightCount in ' --since=2021-01-01

2022-04-06T20:46:47-81ae993/linux-386-longtest
2022-03-10T21:10:30-914195c/windows-amd64-longtest
2021-11-12T22:20:50-9150c16/linux-386-longtest
2021-10-29T17:12:47-3aecb3a/linux-386-longtest

(CC @golang/runtime)

@bcmills
Copy link
Contributor Author

bcmills commented Apr 7, 2022

Marking as release-blocker for Go 1.19 because windows/386 and windows/amd64 are first class ports.

(This at least needs to be triaged by the runtime team before the release, and they can either fix the failure or add a skip based on the outcome of that triage.)

@bcmills bcmills added OS-Windows NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Apr 7, 2022
@bcmills bcmills added this to the Go1.19 milestone Apr 7, 2022
@mknyszek mknyszek self-assigned this Apr 7, 2022
@heschi
Copy link
Contributor

heschi commented Apr 20, 2022

Ping: any progress on this?

@bcmills
Copy link
Contributor Author

bcmills commented May 2, 2022

greplogs --dashboard -md -l -e 'FAIL: TestPingPongHog .*(?:\n .*)*want hogCount/lightCount in ' --since=2022-04-07
2022-04-30T04:04:40-fd6c556/windows-amd64-longtest
2022-04-30T02:35:50-0ebe224/windows-amd64-longtest
2022-04-25T23:36:50-a5d61be/windows-amd64-longtest

@bcmills bcmills changed the title runtime: TestPingPongHog failures with ratios out of range on Windows runtime: TestPingPongHog failures with ratios out of range May 3, 2022
@bcmills bcmills removed the OS-Windows label May 3, 2022
@bcmills
Copy link
Contributor Author

bcmills commented May 3, 2022

greplogs -l -e 'FAIL: TestPingPongHog .*(?:\n .*)*want hogCount/lightCount in ' --since=2022-05-01
2022-05-03T14:36:37-d0cda4d/linux-386-longtest

@bcmills
Copy link
Contributor Author

bcmills commented May 9, 2022

greplogs -l -e 'FAIL: TestPingPongHog .*(?:\n .*)*want hogCount/lightCount in ' --since=2022-05-04
2022-05-06T15:32:04-7f71e1f/windows-amd64-longtest
2022-05-06T04:22:04-3ea3bc0/windows-amd64-longtest
2022-05-06T02:40:50-7dd9884/windows-amd64-longtest
2022-05-04T04:21:29-7cf3268/windows-amd64-longtest

and https://storage.googleapis.com/go-build-log/84130882/linux-amd64-longtest_b0bacea2.log (a SlowBot)

@golang/runtime: can this test be skipped until someone has the bandwidth to fix it? It's gotten really noisy.

@bcmills bcmills added the Soon This needs action soon. (recent regressions, service outages, unusual time-sensitive situations) label May 9, 2022
@cherrymui
Copy link
Member

The code comment says

// We can use a
// fairly large factor here to make this robust: if the
// scheduler isn't working right, the gap should be ~1000X.

All failures above are within a factor of 10 (the current factor is 5). Perhaps we can just increase the factor, to (say) 20 or 50?

@mknyszek
Copy link
Contributor

I agree with Cherry. It sounds like this just needs a lot more slack. However, I think this test might just be fundamentally flaky because it's not robust to OS scheduling delays (I'm thinking about #52433 here). I wonder if we could prove that this is the case with perf on the linux/386 builders.

This is definitely a test that, if we had, for instance, scheduler instrumentation for testing, we could construct a situation where we make sure that a goroutine gets scheduled within N calls to the scheduler on average (or something), instead of bounding it by real timings.

@cherrymui
Copy link
Member

Yeah, scheduler instrumentation would be much better for testing this kind of things.

Perhaps for short term we could increase the threshold to reduce the noise on builders.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/407134 mentions this issue: runtime: relax the threshold for TestPingPongHog

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/407415 mentions this issue: runtime: fix overflow in PingPongHog test

gopherbot pushed a commit that referenced this issue May 19, 2022
On 32-bit systems the result of hogCount*factor can overflow.
Use division instead to do comparison.

Update #52207

Change-Id: I429fb9dc009af645acb535cee5c70887527ba207
Reviewed-on: https://go-review.googlesource.com/c/go/+/407415
Reviewed-by: Michael Knyszek <[email protected]>
Run-TryBot: Keith Randall <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Keith Randall <[email protected]>
@golang golang locked and limited conversation to collaborators Jun 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker Soon This needs action soon. (recent regressions, service outages, unusual time-sensitive situations)
Projects
None yet
Development

No branches or pull requests

5 participants