Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Builds not reproducible with cgo (?) #1547

Closed
justinsb opened this issue Jun 13, 2018 · 11 comments · Fixed by #1580
Closed

Builds not reproducible with cgo (?) #1547

justinsb opened this issue Jun 13, 2018 · 11 comments · Fixed by #1580
Labels

Comments

@justinsb
Copy link

Possibly related to #1531 , and might be the issue being fixed in #1536

Builds (that involve cgo?) produce different binaries from run to run. For example https://github.com/kopeio/etcd-manager

bazel build //cmd/etcd-manager
md5sum bazel-bin/cmd/etcd-manager/linux_amd64_stripped/etcd-manager
# b9ccdf4bcc329e27b04260dc26dd3580  bazel-bin/cmd/etcd-manager/linux_amd64_stripped/etcd-manager
bazel clean
bazel build //cmd/etcd-manager
md5sum bazel-bin/cmd/etcd-manager/linux_amd64_stripped/etcd-manager
9c77176ee931b35dcdbdbd0287d24418  bazel-bin/cmd/etcd-manager/linux_amd64_stripped/etcd-manager

I believe this is because the sandbox path is included in the binary:

> strings bazel-bin/cmd/etcd-manager/linux_amd64_stripped/etcd-manager | grep main...bazel-out
/home/justinsb/.cache/bazel/_bazel_justinsb/4a499e13846793b0120bbd2f7639609a/sandbox/processwrapper-sandbox/380/execroot/__main__/bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/linux_amd64_stripped/stdlib~/src/runtime/cgo
/home/justinsb/.cache/bazel/_bazel_justinsb/4a499e13846793b0120bbd2f7639609a/sandbox/processwrapper-sandbox/380/execroot/__main__/bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/linux_amd64_stripped/stdlib~/src/os/user
/home/justinsb/.cache/bazel/_bazel_justinsb/4a499e13846793b0120bbd2f7639609a/sandbox/processwrapper-sandbox/380/execroot/__main__/bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/linux_amd64_stripped/stdlib~/src/os/user
/home/justinsb/.cache/bazel/_bazel_justinsb/4a499e13846793b0120bbd2f7639609a/sandbox/processwrapper-sandbox/380/execroot/__main__/bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/linux_amd64_stripped/stdlib~/src/runtime/cgo

(And 380 changes in between invocations)

In bazel-bin:

> rgrep __main__ .
./cmd/etcd-manager/linux_amd64_stripped/etcd-manager.runfiles_manifest:__main__/cmd/etcd-manager/linux_amd64_stripped/etcd-manager /home/justinsb/.cache/bazel/_bazel_justinsb/4a499e13846793b0120bbd2f7639609a/execroot/__main__/bazel-out/k8-fastbuild/bin/cmd/etcd-manager/linux_amd64_stripped/etcd-manager
./cmd/etcd-manager/linux_amd64_stripped/etcd-manager.runfiles/MANIFEST:__main__/cmd/etcd-manager/linux_amd64_stripped/etcd-manager /home/justinsb/.cache/bazel/_bazel_justinsb/4a499e13846793b0120bbd2f7639609a/execroot/__main__/bazel-out/k8-fastbuild/bin/cmd/etcd-manager/linux_amd64_stripped/etcd-manager
Binary file ./cmd/etcd-manager/linux_amd64_stripped/etcd-manager matches
Binary file ./external/io_bazel_rules_go/linux_amd64_stripped/stdlib~/pkg/linux_amd64/runtime/cgo.a matches
Binary file ./external/io_bazel_rules_go/linux_amd64_stripped/stdlib~/pkg/linux_amd64/plugin.a matches
Binary file ./external/io_bazel_rules_go/linux_amd64_stripped/stdlib~/pkg/linux_amd64/os/user.a matches

So it's coming from those .a files, which I think are the ones that aren't cached in #1531.

@jayconrod
Copy link
Contributor

I think this is probably a duplicate of #1518, but needs analysis to be sure.

@jayconrod jayconrod added the bug label Jun 13, 2018
@justinsb
Copy link
Author

So this is weird... I have a fix, but it's "magic". I think what's happening is that setting the CGO_CFLAGS flag changes the behaviour, but I'm not really sure where. But here's a fix: justinsb@1875a75

(I reached this via justinsb@2e10d87 which has more debug information, but the flag is a red herring I believe, because it works with any flag)

Going to try this on a few more projects, but thought this might give someone a clue...

(I think this is a dup of #1518, agreed, though maybe this one has a few more details now!)

@justinsb
Copy link
Author

Ah - so here's why (I think): https://github.com/golang/go/blob/5756895877492e3427c92e9ec8784eb1f4b01474/src/cmd/go/internal/work/exec.go#L2253-L2258

My read of that is that setting CGO_CFLAGS completely replaces the defaults, so if we do override those flags, we probably should pass "-g -O2" as well, other the cgo code won't be optimized. But dropping the -g flag is presumably meaning there's no debug information for cgo files.

@justinsb
Copy link
Author

So based on all this I did a PoC of a "real" patch: justinsb@96be613

With that the debug symbol path (from strings) changes from /dev/shm/bazel-sandbox.30d308fc61dc9ec512512f913dbfe3e2/linux-sandbox/13/execroot/__main__/bazel-out/k8-fastbuild/bin/external/ io_bazel_rules_go/linux_amd64_stripped/stdlib~/src/os/user to ./bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/linux_amd64_stripped/stdlib~/src/os/user

(I'm using the shm experiment, but the same applies without it)

That seems to be good enough for a reproducible build, though still a little ugly admittedly. But feedback welcome on the approach (I can send a PR), or feel free to take the learnings and fix it "right" - I know that can be faster on some of these fixes that have several ways to tackle them :-)

@jayconrod
Copy link
Contributor

-fdebug-prefix-map= seems to be the magic flag here. We should be adding that in both the stdlib and cgo builders.

I don't know about adding -g -O2. We get CGO_CFLAGS from the C/C++ toolchain, so I feel like we're overriding user preference if we add it here. I wonder if those flags show up with -c dbg or -c opt are passed to Bazel on the command line?

Thanks for looking into this. This is important, but there's a lot of stuff in flight right now, and I haven't had time to take it on.

@siddharthab
Copy link
Contributor

siddharthab commented Jul 1, 2018

Another thing here is that cgo.go uses a temp dir as its source dir and those temp dir paths are also embedded.

@siddharthab
Copy link
Contributor

siddharthab commented Jul 4, 2018

One final note on this. stdlib is still not fully reproducible when using clang with debug symbols (-g).

There is a bug in clang that does not respect -fdebug-prefix-map for assembly source files (src/runtime/cgo/gcc_*.S). As a result, pkg/*_amd64/runtime/cgo.a which is built partly from assembly code, will continue to have one absolute path in it.

I will file a bug with LLVM for this.

Illustration of the bug:

$ cat > dummy.c
int main() { return 0; }
$ clang -S -c dummy.c -o dummy.S
$ clang -fdebug-prefix-map=$PWD= -g dummy.S -o dummy 
$ strings dummy | grep $PWD

The tests may pass on Travis macOS even with -c dbg because the targets are being built in the same order in a clean workspace, so sandbox numbers are the same both times, and thus, absolute paths are the same both times. If we perturb the order of the build in some way, then the macOS tests will stop passing with -c dbg.

@siddharthab
Copy link
Contributor

@justinsb
Copy link
Author

justinsb commented Jul 4, 2018

Thanks @siddharthab - I ran with 3725f0a (which is going to be 0.13.0 it looks like - thanks @jayconrod ) and this is indeed fixed for me - thank you!

(I wasn't able to see the path you mentioned in my use-case, but I'm not using much cgo. Thank you for looking into it though!)

@siddharthab
Copy link
Contributor

You are welcome @justinsb. You will be fine if you are not using clang with -g. That's the only case when it's not fully fixed.

@siddharthab
Copy link
Contributor

https://reviews.llvm.org/D48989 for LLVM patch. Any followups will happen there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants