Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: change in pacer behavior on ppc64le in go1.20 #66600

Closed
laboger opened this issue Mar 29, 2024 · 3 comments
Closed

runtime: change in pacer behavior on ppc64le in go1.20 #66600

laboger opened this issue Mar 29, 2024 · 3 comments
Assignees
Labels
arch-ppc64x compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.

Comments

@laboger
Copy link
Contributor

laboger commented Mar 29, 2024

Go version

go version go1.20.14 linux/ppc64le

Output of go env in your module/workspace:

GO111MODULE=""
GOARCH="ppc64le"
GOBIN=""
GOCACHE="/home/boger/.cache/go-build"
GOENV="/home/boger/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="ppc64le"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/boger/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/boger/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/boger/golang/go1.20/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/boger/golang/go1.20/go/pkg/tool/linux_ppc64le"
GOVCS=""
GOVERSION="go1.20.14"
GCCGO="gccgo"
GOPPC64="power9"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3032074183=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Built the test in test/bench/go1 and profiled BinaryTree17 in go1.20. I found that according to a perf profile, the % of time spent in gcDrain increased dramatically when the GOMAXPROCS value increased.

GOMAXPROCS=2 perf record -e cycles ./go1.test -test.bench=BinaryTree -test.run=NONE -test.count=4

perf report
  19.40%  go1.test  go1.test          [.] runtime.findObject
  17.95%  go1.test  go1.test          [.] runtime.mallocgc
  11.73%  go1.test  go1.test          [.] runtime.greyobject
   8.21%  go1.test  go1.test          [.] runtime.scanobject
   7.85%  go1.test  go1.test          [.] runtime.gcmarknewobject
   5.57%  go1.test  go1.test          [.] runtime.(*mspan).writeHeapBitsSmall
   3.63%  go1.test  go1.test          [.] runtime.deductAssistCredit
   3.44%  go1.test  go1.test          [.] test/bench/go1.(*binaryNode).itemCheck
   1.93%  go1.test  go1.test          [.] test/bench/go1.bottomUpTree
   1.84%  go1.test  go1.test          [.] runtime.memclrNoHeapPointers
   1.76%  go1.test  go1.test          [.] runtime.gcDrainN
   1.64%  go1.test  go1.test          [.] runtime.(*mspan).heapBits
   1.53%  go1.test  go1.test          [.] runtime.gcDrain

Did the same perf record command except set GOMAXPROCS=32 and the report shows this:
  28.80%  go1.test  go1.test          [.] runtime.gcDrain
  12.25%  go1.test  go1.test          [.] runtime.greyobject
   7.92%  go1.test  go1.test          [.] runtime.findObject
   6.42%  go1.test  go1.test          [.] runtime.scanobject

I tried this on go1.19 and I don't see the dramatic increased % in gcDrain.

What did you see happen?

I turned on some GODEBUG trace values and see that in go1.19 the pacer goal increased significantly in go1.19 and as a result the garbage collector doesn't run as often. In go1.20 the goal stays low resulting in a lot more executions of the garbage collector.
I also tried the same experiment on an x86 and don't see the increase in % for gcDrain.

Output from go1.19:

pacer: assist ratio=+4.320720e+005 (scan 0 MB in 263->4 MB) workers=0++2.500000e-001
pacer: 64% CPU (25 exp.) for 104104+3552+287792 B work (287792 B exp.) in 3997696 B -> 275783680 B (∆goal 271589376, cons/mark +0.000000e+000)

pacer: assist ratio=+1.896515e-003 (scan 8 MB in 326->525 MB) workers=0++2.500000e-001
gc 2 @1.859s 0%: 0.032+0.89+0.003 ms clock, 0.065+0/0.18/1.3+0.006 ms cpu, 326->326->287 MB, 525 MB goal, 0 MB stacks, 0 MB globals, 2 P (forced)
pacer: assist ratio=+2.838705e-003 (scan 0 MB in 287->575 MB) workers=0++2.500000e-001
gc 3 @1.861s 0%: 0.053+0.94+0.002 ms clock, 0.10+0/0.11/1.2+0.004 ms cpu, 287->287->287 MB, 575 MB goal, 0 MB stacks, 0 MB globals, 2 P (forced)

Output from go1.20:

pacer: assist ratio=+4.325928e+000 (scan 0 MB in 4->4 MB) workers=0++2.500000e-001
pacer: 37% CPU (25 exp.) for 102320+4184+283504 B work (283504 B exp.) in 4349952 B -> 4390912 B (∆goal -24576, cons/mark +0.000000e+000)
gc 1 @0.000s 15%: 0.037+0.25+0.052 ms clock, 0.074+0.074/0.14/0.038+0.10 ms cpu, 4->4->3 MB, 4 MB goal, 0 MB stacks, 0 MB globals, 2 P
pacer: assist ratio=+1.173052e-001 (scan 0 MB in 4->7 MB) workers=0++2.500000e-001
gc 2 @0.003s 7%: 0.044+0.12+0.002 ms clock, 0.088+0/0.093/0.13+0.005 ms cpu, 4->4->0 MB, 7 MB goal, 0 MB stacks, 0 MB globals, 2 P (forced)

Someone had a question about why there was a dramatic increase in gcDrain % which is why I started experimenting.

What did you expect to see?

I did not expect to see such an increase in gcDrain when increasing GOMAXPROCS. It appears that something changed in go1.20 to cause this.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Mar 29, 2024
@thanm
Copy link
Contributor

thanm commented Mar 29, 2024

@golang/compiler

@thanm thanm added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 29, 2024
@mknyszek mknyszek self-assigned this Mar 31, 2024
@laboger
Copy link
Contributor Author

laboger commented Apr 1, 2024

I'm going to close this for now. I'm not seeing the same behavior today that I was seeing on Friday. I don't know what setting I had was different that affected what I was seeing.

Someone had reported that they were seeing some degradation on their application when using more processors and their profiles showed a lot of time spent in gcDrain. So I tried to find some simpler tests to see if I can find something that demonstrated the same behavior -- and that seemed to happen in test/bench/garbage/tree and test/bench/go1/BinaryTree. When setting GOMAXPROCS=2 gcDrain was very low in the profile but with GOMAXPROCS=32 it is high in the profile. The GODEBUG output showed that the garbage collector was runing many more times with more processors and the gcpacer output was very different too (but maybe setting GODEBUG affected that?)

But today I see the same behavior with gcDrain in the latest and earlier Go releases and in ppc64le and x86. I don't know if that is expected or not. Seems like it is not scaling well.

@laboger laboger closed this as completed Apr 1, 2024
@mknyszek
Copy link
Contributor

mknyszek commented Apr 2, 2024

I'm not very familiar with those benchmarks, but note that it's been the case for a long time that if you have a high GOMAXPROCS for a benchmark with relatively low parallelism, and a lot of (perceived) idle time (for example, the benchmark is single-threaded and GOMAXPROCS is 32), the GC will eat up that idle time (in the example, 31 threads worth) with mark workers. You should see gcDrainMarkWorkerIdle show up in the profile in that case (it's possible we didn't break that out in the versions of Go you're trying).

I'm not sure I understand your "go1.19" and "go1.20" GC trace output. One heap is clearly much, much larger, so naturally there will be many fewer GC cycles and a lot less time spent in gcDrain (assuming the allocation rate is roughly the same).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-ppc64x compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants