Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/link: support BUILD_PATH_PREFIX_MAP for reproducible binaries when built under varying path #22491

Closed
stapelberg opened this issue Oct 30, 2017 · 7 comments

Comments

@stapelberg
Copy link
Contributor

stapelberg commented Oct 30, 2017

(Splitting this out of #16860 as per iant’s request in https://golang.org/cl/73291)

Currently, the full file system path to the source code files is included in Go binaries, e.g.:

mkdir -p $(go env GOPATH)/src/helloworld
cat >$(go env GOPATH)/src/helloworld/main.go <<'EOT'
package main

import "fmt"

func main() {
	fmt.Println("hello world")
}
EOT
go install helloworld

mkdir -p /tmp/gopath2/src
cp -r $(go env GOPATH)/src/helloworld /tmp/gopath2/src
GOPATH=/tmp/gopath2 go install helloworld

The two resulting binaries differ:
colordiff -u <(strings $(go env GOPATH)/bin/helloworld) <(strings /tmp/gopath2/bin/helloworld) prints:

--- /proc/self/fd/11	2017-10-30 10:10:42.429032627 +0100
+++ /proc/self/fd/12	2017-10-30 10:10:42.437032493 +0100
@@ -17468,7 +17468,7 @@
 type..eq.fmt.fmt
 main.main
 main.init
-/home/michael/go/src/helloworld/hello.go
+/tmp/gopath2/src/helloworld/hello.go
 /home/michael/sdk/go1.9.2/src/fmt/scan.go
 /home/michael/sdk/go1.9.2/src/fmt/print.go
 /home/michael/sdk/go1.9.2/src/fmt/format.go
@@ -18024,7 +18024,7 @@
 /home/michael/sdk/go1.9.2/src/fmt/format.go
 /home/michael/sdk/go1.9.2/src/fmt/print.go
 /home/michael/sdk/go1.9.2/src/fmt/scan.go
-/home/michael/go/src/helloworld/hello.go
+/tmp/gopath2/src/helloworld/hello.go
 9OA7-
 ;AA7%
 -k7U

For the https://reproducible-builds.org/ project, it is desirable to add a way to make the compilation output independent of the build path (see also https://reproducible-builds.org/docs/build-path/), or, in other words, “reproducible when built under varying build path”.

Change https://golang.org/cl/73291 implements the BUILD_PATH_PREFIX_MAP specification.

The above test has been done with go1.9.2, but git HEAD behaves the same:

% go env
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/michael/go"
GORACE=""
GOROOT="/home/michael/sdk/go1.9.2"
GOTOOLDIR="/home/michael/sdk/go1.9.2/pkg/tool/linux_amd64"
GCCGO="/usr/bin/gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build412432156=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/73291 mentions this issue: cmd/link: apply BUILD_PATH_PREFIX_MAP

@ianlancetaylor ianlancetaylor changed the title cmd/link: support BUILD_PATH_PREFIX_MAP for reproducible binaries when built under varying path proposal: cmd/link: support BUILD_PATH_PREFIX_MAP for reproducible binaries when built under varying path Oct 31, 2017
@gopherbot gopherbot added this to the Proposal milestone Oct 31, 2017
@ianlancetaylor
Copy link
Contributor

CC @neild

@rsc
Copy link
Contributor

rsc commented Nov 6, 2017

This issue is titled with the feature you want instead of the issue you have.

If the goal is to allow users to have debug info in their binaries but also trim file name prefixes in order to enable reproducible builds, then let's do that directly instead of introducing a dependence on a new, untested spec that in turn depends on a magic environment variable with a magic encoding that I can't even understand. It makes absolutely no sense to me that a toolchain that aims for simplicity would give any meaning at all to (quoting an example from the spec):

BUILD_PATH_PREFIX_MAP=ERROR=/a/b%+yyy:lol=/a:ERROR=/b/1234:foo=/b:libbar-3-bison++_41%.10.5-3~rc1pre3+dfsg1.1-3nmu1+b4=/a/b%+yyy

If gcc already implements this spec in its linker, then it seems like you could use -linkmode=external to have gcc do the rewrite, if you must use this environment variable.

The script in the first comment above (which I amended to cat into a file instead of a directory) does not require toolchain changes to produce identical binaries from gopath1 and gopath2.

$ GOPATH=/tmp/gopath go1.9 build -gcflags=-trimpath=/tmp/gopath -o hello1 helloworld
$ GOPATH=/tmp/gopath2 go1.9 build -gcflags=-trimpath=/tmp/gopath2 -o hello2 helloworld
$ cmp hello1 hello2
$ 

There is an issue that perhaps the toolchain should support multiple trimpath arguments (#22382). Perhaps for more complex examples there are conflicts between the toolchain's use of trimpath and an extra use in -gcflags. If that's the root problem, then let's solve that.

I don't see any compelling reason to add a second, completely different and frankly very complicated mechanism targeting the same goal as -trimpath.

@stapelberg
Copy link
Contributor Author

Thanks for taking a look at this issue.

I now realize that the instructions I gave in the first comment weren’t sufficient; sorry about that. We’re already using -trimpath in -gcflags and -asmflags in Debian, but -trimpath does not actually trim all paths:

Here is an updated test script:

#!/bin/zsh

set -e

for GOPATH in /tmp/gopath1 /tmp/gopath2; do
    mkdir -p ${GOPATH}/src/helloworld
    cat >${GOPATH}/src/helloworld/main.go <<'EOT'
package main

import (
	"log"

	"github.com/docker/docker/pkg/term"
	"golang.org/x/crypto/curve25519"
)

func main() {
	var a term.Winsize
	log.Printf("a = %v", a)

	var dst, in [32]byte
	curve25519.ScalarBaseMult(&dst, &in)
}
EOT
    go get golang.org/x/crypto/curve25519 github.com/docker/docker/pkg/term
    go install -gcflags=-trimpath=$GOPATH -asmflags=-trimpath=$GOPATH helloworld
done

colordiff -u <(strings -n 10 /tmp/gopath1/bin/helloworld) <(strings -n 10 /tmp/gopath2/bin/helloworld)

Is this an oversight in cmd/link?

@rsc
Copy link
Contributor

rsc commented Nov 6, 2017

Doing anything in the linker seems to me too late. We should really be aiming to not put the paths into the intermediate objects either, since that will at the very least result in better caching, and if you do need to verify something, you'd think you'd want reproducible intermediate objects too. Also, the content hash of the inputs affects the build ID recorded in the binary, so if the intermediate objects differ, the output will differ too. This is by design. Also the change in the build ID would not be fixable with BUILD_PATH_PREFIX_MAP since the path prefixes causing the build ID change have been hashed and are no longer visible to the linker.

Also, expecting users to cobble together the necessary magic incantations seems to me unreasonable, especially since as the toolchain evolves the magic incantations may change. If builds without full file paths are something we want to support for the long term, that should have a simple API (like 'go install -newflag' / 'go build -newflag', for some value of newflag) that we can maintain as the underlying tools change and that would do the right thing in any toolchain, instead of hard-coding the answer for the gc toolchain of a specific day.

That seems to me exactly #16860, which we didn't get to for Go 1.10 but that we could probably plan to get to for Go 1.11.

I'm honestly a little surprised these reproducible builds bogged down by file name issues don't just use chroots to get control over the file names.

@infinity0
Copy link

Hi, author of the spec here. It was mainly designed for languages that allow much greater freedom on where source files are placed (e.g. C/C++), and I can understand that this level of complexity might not be necessary for Go.

I'd like to also point out again the original 5-line patch I wrote in the first comment of #16860. That will cause absolute paths to be relative, however the downside is that builders can't specify a specific new "base" dir, which is possible with BPPM. AIUI, there was some concern that forcing everything to be relative, might affect debugging tools.

Perhaps some sort of halfway-point could be found between my original patch and @stapelberg's proposed BPPM patch, if the latter is considered too complex for Go. Though regarding the former, I haven't rebased it or tested it recently.

As for chroots, we think it would be a nice goal to support build-path-independent reproducibility in the toolchain itself, without needing extra container management software that would be yet another dependency for developers (and rebuilders and bootstrappers, etc). Since build-time paths don't exist on run-time systems anyway, the information is not useful, and getting rid of it ought not to be a complex thing for a reasonable toolchain. If it is complex, that means something is depending on these invalid paths, and that behaviour is perhaps broken or at least questionable. So it's a useful QA exercise.

@rsc
Copy link
Contributor

rsc commented Nov 6, 2017

As I noted above (#22491 (comment)), the new toolchain invalidates the 5-line patch in #16860. Stripping the file names during link, or even linking with -w to strip dwarf entirely, will not produce reproducible builds if there were different files recorded in the .a files in the two builds. There needs to be a separate explicit signal to the go command, and the go command needs to be responsible for doing the right thing, both with the gccgo and gc toolchains. Let's leave that for #16860.

@rsc rsc closed this as completed Nov 6, 2017
hnakamur pushed a commit to hnakamur/golang-deb that referenced this issue Nov 3, 2018
Upstream has rejected the patch in this form and promised to implement an
alternative they are happy with instead. That hasn't happened yet though.
Bug: golang/go#22491, golang/go#16860
Forwarded: https://golang.org/cl/73291 (rejected upstream though)
Last-Update: 2018-02-08

Gbp-Pq: Name 0001-reproducible-BUILD_PATH_PREFIX_MAP.patch
@golang golang locked and limited conversation to collaborators Nov 6, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants