Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: uwriter output is nondeterministic #69547

Closed
sluongng opened this issue Sep 20, 2024 · 12 comments
Closed

cmd/compile: uwriter output is nondeterministic #69547

sluongng opened this issue Sep 20, 2024 · 12 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.

Comments

@sluongng
Copy link

Go version

1.23.0 (Linux AMD64 and MacOS ARM64)

Output of go env in your module/workspace:

(rules_go does sanitize the env vars quite a bit, but here are the relevant variables)

CGO_ENABLED=1
GOARCH=amd64
GODEBUG=winsymlink=0
GOEXPERIMENT=nocoverageredesign
GOOS=linux
GOPATH=
GOROOT_FINAL=GOROOT
GOTOOLCHAIN=local

What did you do?

Context

https://github.com/bazelbuild/rules_go is an alternative way to build Go application using Bazel. It does so by manually instrumenting the go tools such as go tool compile or go tool link in hermetic Bazel sandboxes. The intermediate artifacts is then cached by Bazel for subsequent reuse.

What I did

For the Compile step, we are currently calling go tool compile with both -o <compile-output>.x and -linkobj <link-output>.a. The .x file is then used for subsequent downstream package compilations while the .a file will be used for downstream executable linkings.

Here is the relevant code: https://github.com/bazelbuild/rules_go/blob/8b8294b6359a9c0a89af968fb7092f6e7194f420/go/tools/builders/compilepkg.go#L494-L495

What did you see happen?

When I upgraded gorm.io/driver/mysql, a direct dependency of db.go, to a newer version, I noticed that the new db.x was changed.

The binary diff between 2 .x files are just a few bytes small

> diffoscope ~/Downloads/db*
--- /Users/sluongng/Downloads/db (1).x
+++ /Users/sluongng/Downloads/db.x
│┄ Command `'nm -s {}'` failed with exit code 1. Standard output:
│┄     /Library/Developer/CommandLineTools/usr/bin/nm: error: for the -s option: bad number of arguments (must be two arguments)
│┄     /Library/Developer/CommandLineTools/usr/bin/nm: error: a.out: No such file or directory
@@ -34357,15 +34357,15 @@
 00086340: 0104 0203 0405 1000 8108 0082 0803 7b03  ..............{.
 00086350: 4303 0103 0203 4103 8001 0324 034a 031b  C.....A....$.J..
 00086360: 03d4 0303 d503 03dc 0303 d803 0081 080f  ................
 00086370: 010d 0203 0405 0607 0809 0a0b 0c0d 0e0b  ................
 00086380: 0083 0800 8408 0343 034a 03d4 0303 d503  .......C.J......
 00086390: 03d8 0303 da03 0341 0365 0083 080a 0108  .......A.e......
 000863a0: 0203 0405 0607 0809 1200 8508 0086 0803  ................
-000863b0: bd01 03d8 0403 d403 037b 0341 0380 0103  .........{.A....
+000863b0: bd01 03d4 0303 d804 037b 0341 0380 0103  .........{.A....
 000863c0: 4a03 ea04 03d5 0303 d504 03d8 0303 2403  J.............$.
 000863d0: ef04 03d6 0403 dc03 0085 0811 010f 0203  ................
 000863e0: 0405 0607 0809 0a0b 0c0d 0e0f 1026 0087  .............&..
 000863f0: 0800 8808 0328 036f 0355 0358 035a 0302  .....(.o.U.X.Z..
 00086400: 0341 0324 03d9 0403 dc04 03dd 0403 0103  .A.$............
 00086410: 2303 5703 5903 2903 2503 5f03 6a03 4403  #.W.Y.).%._.j.D.
 00086420: 6d03 4a03 1b03 e604 03e7 0403 d402 0380  m.J.............
@@ -113551,9 +113551,9 @@
 001bb8e0: 0806 0103 0004 0000 0117 0004 0d00 0017  ................
 001bb8f0: 0102 0100 0101 00d0 080f 0002 1700 0f01  ................
 001bb900: 0100 1701 040b 0b01 00d0 0815 1701 0001  ................
 001bb910: 0103 0000 0c3a 1700 0401 0105 0100 d008  .....:..........
 001bb920: 2014 0100 d008 2300 0400 0100 d008 270a   .....#.......'.
 001bb930: 0100 d108 0400 0117 0004 0101 0500 0100  ................
 001bb940: d208 0300 0a01 00d3 0803 0001 1401 00d3  ................
-001bb950: 080a 0004 0001 00d4 0802 8162 2d2d 5d3d  ...........b--]=
-001bb960: fb45 0a24 240a                           .E.$$.
+001bb950: 080a 0004 0001 00d4 0802 907f c3c9 9764  ...............d
+001bb960: 77e3 0a24 240a                           w..$$.
exit 1

I tried using different tools I could find in x/tools to better understand the content of the file but it seems like the content is pretty consistent between 2 files

~/work/golang/pkg/mod/golang.org/x/[email protected]/go/gcexportdata> go run main.go -- ~/Downloads/db.x > /tmp/1.txt
~/work/golang/pkg/mod/golang.org/x/[email protected]/go/gcexportdata> go run main.go -- ~/Downloads/db\ \(1\).x > /tmp/2.txt
~/work/golang/pkg/mod/golang.org/x/[email protected]/go/gcexportdata> diff -u /tmp/{1,2}.txt | wc -l
0

I also used a modified version of x/tools/internal/gcimporter/main.go to analyze the file

func main() {
    // Define the .x file input flag
    filePath := flag.String("file", "", "Path to the .x file containing the export data")
    flag.Parse()

    if *filePath == "" {
        log.Fatal("Please provide the path to a .x file using the -file flag")
    }

    // Create a new token.FileSet
    fset := token.NewFileSet()

    // Read the export data from the specified .x file
    export, err := os.ReadFile(*filePath)
    if err != nil {
        log.Fatalf("can't read %q export data: %v", *filePath, err)
    }

    // Use gcexportdata to read the data and create a *types.Package
    r, err := gcexportdata.NewReader(bytes.NewReader(export))
    if err != nil {
        log.Fatalf("reading export data %s: %v", *filePath, err)
    }

    tpkg1, err := gcexportdata.Read(r, fset, make(map[string]*types.Package), "")
    if err != nil {
        log.Fatalf("decoding export data: %v", err)
    }

    // Print the package information from the .x file
    fmt.Println("# Read from export data:")
    printPackage(tpkg1)

    // Now reexport as indexed (deep) export data, and reimport.
    var tpkg2 *types.Package
    {
        var out bytes.Buffer
        if err := gcimporter.IExportData(&out, fset, tpkg1); err != nil {
            log.Fatal(err)
        }
        var err error
        _, tpkg2, err = gcimporter.IImportData(fset, make(map[string]*types.Package), out.Bytes(), tpkg1.Path())
        if err != nil {
            log.Fatal(err)
        }
    }

    // Print the re-imported package information
    fmt.Println("# After round-tripping through indexed export data:")
    printPackage(tpkg2)
}

// printPackage prints out details of the package
func printPackage(pkg *types.Package) {
// ...taken from the original file 
}

But still cannot see any diff between the 2 files.

Here are the 2 files attached.
db.tar.gz

What did you expect to see?

The expectation here is that if we were to compile package github.com/buildbuddy-io/buildbuddy/server/util/db/db.go, if there were no changes in exported data in db.go, then db.x will have reproducible content, and thus, downstream packages which depend on db.x will not need to recompile.

@seankhliao seankhliao changed the title import/path: gc exported data is not reproducible cmd/compile: exported data is not reproducible Sep 20, 2024
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Sep 20, 2024
@adonovan
Copy link
Member

adonovan commented Sep 20, 2024

There are two changes in the diff:

< 000863b0  bd 01 03[d4 03 03 d8 04] 03 7b 03 41 03 80 01 03  |.........{.A....|
---
> 000863b0  bd 01 03[d8 04 03 d4 03]  03 7b 03 41 03 80 01 03  |.........{.A....|
< 001bb950  08 0a 00 04 00 01 00 d4  08 02[90 7f c3 c9 97 64  |...............d|
< 001bb960  77 e3] 0a 24 24 0a                                 |w..$$.|
---
> 001bb950  08 0a 00 04 00 01 00 d4  08 02[81 62 2d 2d 5d 3d  |...........b--]=|
> 001bb960  fb 45] 0a 24 24 0a                                 |.E.$$.|

The second change, in the last 8 bytes of the payload (before the final $$), is just the fingerprint of the whole file, which is expected to change if there is any other change in the data. So, the only significant change is the first one. If we include in in the previous unchanged byte (03) we notice that it is a reordering of two three-byte strings (03 d4 03) and (03 d8 04). This suggests a nondeterministic map iteration in the encoder.

So what are these bytes? Instrumenting the ureader, the pos value (base offset) is 301ec, which means the file-relative offset 863b2 of the reordered string is the section-relative offset 863b2 - 301ec = 561c6. Dumping all the elemEnds, we find that this one falls within the range of elemEnds[12207]:

i             elemEnds[i]
12205   56102
12206   5612b
12207   56170 <-- 561c6 is after this
12208   561f1
12209   5622b

However, a panic statement added to the decoding of that element (for type information only, by x/tools/internal/gcimporter/main.go) is never executed, so I suspect it is an element needed only by the compiler (e.g. escape, inlining, generics) and not part of the type information.

I may have made a mistake in my analysis above, but I think this is a problem for the compiler team.

@timothy-king

@adonovan adonovan changed the title cmd/compile: exported data is not reproducible cmd/compile: ureader output is nondeterministic Sep 20, 2024
@adonovan adonovan changed the title cmd/compile: ureader output is nondeterministic cmd/compile: uwriter output is nondeterministic Sep 20, 2024
@timothy-king timothy-king self-assigned this Sep 20, 2024
@timothy-king timothy-king added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 20, 2024
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/614717 mentions this issue: go/ssa: remove loader utility functions

@cuonglm
Copy link
Member

cuonglm commented Sep 23, 2024

@sluongng Do you have a chance to check whether this problem happen with go1.22.x?

@sluongng
Copy link
Author

I briefly rolled back my repo to 1.22.7 to do the same test but the diff is much bigger

> diffoscope db{1,2}.x | wc -l
  223704
  
> diffoscope db{1,2}.x | head -20
--- db1.x
+++ db2.x
│┄ Command `'nm -s {}'` failed with exit code 1. Standard output:
│┄     /Library/Developer/CommandLineTools/usr/bin/nm: error: for the -s option: bad number of arguments (must be two arguments)
│┄     /Library/Developer/CommandLineTools/usr/bin/nm: error: a.out: No such file or directory
@@ -1,21 +1,21 @@
 00000000: 213c 6172 6368 3e0a 5f5f 2e50 4b47 4445  !<arch>.__.PKGDE
 00000010: 4620 2020 2020 2020 3020 2020 2020 2020  F       0
 00000020: 2020 2020 3020 2020 2020 3020 2020 2020      0     0
-00000030: 3634 3420 2020 2020 3138 3037 3534 3520  644     1807545
+00000030: 3634 3420 2020 2020 3138 3038 3531 3420  644     1808514
 00000040: 2020 600a 676f 206f 626a 6563 7420 6c69    `.go object li
 00000050: 6e75 7820 616d 6436 3420 676f 312e 3232  nux amd64 go1.22
 00000060: 2e37 2047 4f41 4d44 3634 3d76 3120 583a  .7 GOAMD64=v1 X:
 00000070: 7265 6761 6269 7772 6170 7065 7273 2c72  regabiwrappers,r
 00000080: 6567 6162 6961 7267 732c 616c 6c6f 6368  egabiargs,alloch
 00000090: 6561 6465 7273 2c65 7865 6374 7261 6365  eaders,exectrace
 000000a0: 7232 0a0a 0a24 2442 0a75 0100 0000 0000  r2...$$B.u......
-000000b0: 0000 232b 0000 252b 0000 fe2c 0000 cf2f  ..#+..%+...,.../
-000000c0: 0000 4036 0000 96ab 0000 07b2 0000 78b8  [email protected].

Here are the 2 .x files on 1.22.7.
db.tar.gz

Not sure what to make of it. I have yet to run any of the gcexports tools on them to see if I can make sense of the diff somehow.

@timothy-king
Copy link
Contributor

If I understand correctly, you first compiled "db.go" with version ABC of a dependency gorm.io/driver/mysql to get db1.x, and then you upgraded gorm.io/driver/mysql to version DEF and recompiled "db.go" to get db2.x. You then saw a difference in these files.

Looking through the files of the build files on bazel-contrib/rules_go#4111 (comment) . There are gorm '.x' files that are different. First I found is bazel-out/.../io_gorm_driver_mysql/mysql.x.

I suspect this is WAI. I think the first thing to do is to examine the build cache after building gorm.io/driver/mysql at versions ABC and DEF. Unless these are identical, it is fair game for db1.x and db2.x to differ. This could be contents of inlined functions or instantiations of generic function bodies not being identical, or an internal type in export data that is transitively reachable changed a field name. If these gorm.io/driver/mysql builds produces all identical files that are the inputs to this build command for both versions, please give me detailed instructions for how to reproduce the builds. I think this will be more productive to try to catch this as it happens than parse the files.

@timothy-king timothy-king added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Sep 23, 2024
@sluongng
Copy link
Author

If I understand correctly, you first compiled "db.go" with version ABC of a dependency gorm.io/driver/mysql to get db1.x, and then you upgraded gorm.io/driver/mysql to version DEF and recompiled "db.go" to get db2.x. You then saw a difference in these files.

Looking through the files of the build files on bazel-contrib/rules_go#4111 (comment) . There are gorm '.x' files that are different. First I found is bazel-out/.../io_gorm_driver_mysql/mysql.x.

https://app.buildbuddy.io/invocation/0bd82434-6213-466b-aa76-e3c68250c92a?fileFilter=io_gorm_driver_mysql%2Fmysql.x#files
https://app.buildbuddy.io/invocation/a21c41a4-d2d3-49b5-9fa2-928cb7043d1e?fileFilter=io_gorm_driver_mysql%2Fmysql.x#files

Correct.

I suspect this is WAI. I think the first thing to do is to examine the build cache after building gorm.io/driver/mysql at versions ABC and DEF. Unless these are identical, it is fair game for db1.x and db2.x to differ. This could be contents of inlined functions or instantiations of generic function bodies not being identical, or an internal type in export data that is transitively reachable changed a field name. If these gorm.io/driver/mysql builds produces all identical files that are the inputs to this build command for both versions, please give me detailed instructions for how to reproduce the builds. I think this will be more productive to try to catch this as it happens than parse the files.

That would be very disappointing to know.

We intentionally use -linkobj <pkg>.a and -o <pkg>.x when using go tool compile so that we could have a more stable .x output with only the exported information needed for downstream compilation. We were hoping that if only the transitive deps was updated, as long as the exported namespaces (const, var, func, struct etc...) did not change, then .x would remain stable and thus, more cache friendly in Bazel/Blaze setup.

@adonovan
Copy link
Member

If I understand correctly, you first compiled "db.go" with version ABC of a dependency gorm.io/driver/mysql to get db1.x, and then you upgraded gorm.io/driver/mysql to version DEF and recompiled "db.go" to get db2.x. You then saw a difference in these files. ... I suspect this is WAI.

In haste I completely missed the part where you (@sluongng) mentioned that you upgraded a dependency between the two builds. @timothy-king is right: there is more information in an export data file than just type information. Even the same type information can be (deterministically) written in multiple ways that will cause build cache misses.

The the Go tools (compiler, linker, etc) are intended to be fully deterministic pure functions (ignoring the special case of post-link executable stamping), and the go build system, like Bazel, is an evaluator of pure deterministic functions in which every bit of input matters. So if you present the same inputs, you should get the same outputs; anything else is a bug. It certainly should not be path dependent: in other words, the sequence of prior builds should have no effect on the output of the current one. However, in this case, you presented different inputs, presumably differing only in some minor detail, and the tool computed differing outputs, though functionally identical.

The key test would be to delete the output files (e.g. go/blaze clean), to re-execute the build (without any changes to the inputs), and to check whether it re-creates an identical output file. If so, then there is no determinism problem.

@sluongng
Copy link
Author

there is more information in an export data file than just type information. Even the same type information can be (deterministically) written in multiple ways that will cause build cache misses.

Hmm that's surprising and disappointing to hear. Our main use case in rules_go was to divide compile outputs into .x and .a, with hope that (a) downstream compile actions (not link actions) will download smaller inputs and (b) .x will change less often as exported data of a package does not always get updated.

It sounds like we will only benefit from (a) and not (b).

My question is: do you think our use case makes sense to warrant a feature request here? Or is it something the Go team is not willing to support?


Side note: I would love to read more about the gc export data information if there are any documents available. I tried parsing through x/tools repo code but it's rather a slow process.

@adonovan
Copy link
Member

adonovan commented Sep 24, 2024

Hmm that's surprising and disappointing to hear. Our main use case in rules_go was to divide compile outputs into .x and .a, with hope that (a) downstream compile actions (not link actions) will download smaller inputs and (b) .x will change less often as exported data of a package does not always get updated.

It sounds like we will only benefit from (a) and not (b).

Dividing compile outputs into export information and object code is a very worthwhile optimization; we do it in Blaze (the version of Blaze used internally at Google), and analogous things in other languages (for example, Java's JVM class files contain both type information and function bodies, and only the former is needed for downstream compilation, so it makes sense to split the files). I confess I thought Bazel's rules_go already did this.

Nonetheless, the issue @timothy-king is describing is that the exported information used by downstream compilation is more than meets the eye. It contains not just the types of public declarations; it also holds information used by escape analysis, inlining, and generics, and all of these can be very sensitive to small changes in indirect dependencies. So it will change even when the types proper do not. But it is still a pure function of the transitive closure of dependencies---not of the previous states.

My question is: do you think our use case makes sense to warrant a feature request here? Or is it something the Go team is not willing to support?

What exactly are you asking for? I think it might be instructive if there were a way to inspect every byte of the export data file so that you could understand how it depends on the inputs. But I think the toolchain already does everything you actually need.

Side note: I would love to read more about the gc export data information if there are any documents available. I tried parsing through x/tools repo code but it's rather a slow process.

There isn't a whole lot of documentation on it besides these two items:

but I did an internal presentation a while back that doesn't contain any confidential information, and though it's quite rough, you might find it useful. I have attached a PDF below.

@timothy-king
Copy link
Contributor

I am going to close this as I am not sure there is anything we can do. If you have a more specific feature request, please file a new issue. You may also want to ask questions in the #compilers channel of the Gophers slack.

We were hoping that if only the transitive deps was updated, as long as the exported namespaces (const, var, func, struct etc...) did not change, then .x would remain stable and thus, more cache friendly in Bazel/Blaze setup.

The .x files for dependencies should still be fine to cache on for a Bazel/Blaze setup. These will change less often, but they may still change. These files are off by a few bytes. You may end up with .x cache hits fairly often if the information in the namespaces does not change. It may be interesting to measure this at some point.

downstream compile actions (not link actions) will download smaller inputs

You may end up downloading 2 copies of a lot of the information most compilations with such a strategy. Both copies would need coherent descriptions of many [most] of the types. For really big go programs, what I have seen is that the types can be much larger than the other data, and these need to be in both parts, and even more importantly are often required transitively in multiple packages. (Think big generated protos that are passed around.) This is the kind of thing you have to measure to know if it helps, and it may not help uniformly.

I tried parsing through x/tools repo code but it's rather a slow process.

Yes it is. The documentation is lacking and we have recently lost some institutional knowledge. The best documentation is the source for the writer at the moment: https://github.com/golang/go/blob/master/src/cmd/compile/internal/noder/noder.go . That is even slower going, but it is the most accurate.

@sluongng
Copy link
Author

Thank you both for the great insights. This is valuable information for us to improve Bazel's rules_go further. 🙇

Let me do some reading and jump on Gophers Slack with more questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

6 participants