Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure 'pip wheel' can create .so artifacts deterministically #6505

Closed
thundergolfer opened this issue May 15, 2019 · 21 comments
Closed

Ensure 'pip wheel' can create .so artifacts deterministically #6505

thundergolfer opened this issue May 15, 2019 · 21 comments
Labels
resolution: out of scope type: enhancement Improvements to functionality type: feature request Request for a new feature

Comments

@thundergolfer
Copy link

thundergolfer commented May 15, 2019

What's the problem this feature will solve?

The Bazel build system has the major selling point of supporting both local and remote-caching.

In order for that caching to work though, Bazel targets must be built deterministically so that the same target always has the same content-addressable hash.

Currently pip wheel is non-deterministic, so our Python Bazel targets will cache miss if they depend on something built with pip wheel.

Describe the solution you'd like

Note: The following is the output of a Bazel execution log. A bit unrelated to the pip wheel command but shows the relevant information.

inputs {
  path: "external/pypi__PyYAML_5_1/PyYAML-5.1.dist-info/LICENSE"
  digest {
    hash: "a2adb9c959b797494a0ef80bdf60e22db2749ee3e0c0908556e3eb548f967c56"
    size_bytes: 1101
    hash_function_name: "SHA-256"
  }
}
inputs {
  path: "external/pypi__PyYAML_5_1/PyYAML-5.1.dist-info/METADATA"
  digest {
    hash: "df7bc0c7cbd2ce350c5c61ceda3a74bbcb6f82446a7c01f7f8e1034a98df231f"
    size_bytes: 1704
    hash_function_name: "SHA-256"
  }
}
inputs {
  path: "external/pypi__PyYAML_5_1/PyYAML-5.1.dist-info/RECORD"
  digest {
    hash: "6fe803b74ab4fcab1f23e96060cf062d12779598af7e72692c492c2dd7cad0ed"
    size_bytes: 1701
    hash_function_name: "SHA-256"
  }
}
inputs {
  path: "external/pypi__PyYAML_5_1/PyYAML-5.1.dist-info/WHEEL"
  digest {
    hash: "cdf2c8f141bc498ae490a88870d655dd174abe3db8c1f57562224b168930c624"
    size_bytes: 104
    hash_function_name: "SHA-256"
  }
}
inputs {
  path: "external/pypi__PyYAML_5_1/PyYAML-5.1.dist-info/top_level.txt"
  digest {
    hash: "ae98f42153138ac02387fd6f1b709c7fdbf98e9090c00cfa703d48554e597614"
    size_bytes: 11
    hash_function_name: "SHA-256"
  }
}
inputs {
  path: "external/pypi__PyYAML_5_1/_yaml.cpython-36m-x86_64-linux-gnu.so"
  digest {
    hash: "a7f3774015f839ccee5e2281bbfdf22a42e0e1dafaac33ef4c91db83a07210d9"
    size_bytes: 1133288
    hash_function_name: "SHA-256"
  }
}
inputs {
  path: "external/pypi__PyYAML_5_1/yaml/__init__.py"
  digest {
    hash: "2af8b6dbcb1df5c63597f215421cad02f2317e291061b181b0f7bbf4f71ac0dd"
    size_bytes: 12012
    hash_function_name: "SHA-256"
  }
}

The following is a subset of the build outputs of the PyYAML package. Of the build outputs, it is the RECORD files and the _yaml.cpython-36m-x86_64-linux-gnu.so shared object file that have non-deterministic hashes build to build. I have inspected the RECORD file and found that it contains the hash of the .so file, so it is non-deterministic because of the .so file, and I think only because of that.

So the problem is the .so file.

I ran the strings program on the .so file and found this printable string: /tmp/pip-wheel-_bd8v3f2/pyyaml. That is coming from here:

with TempDirectory(kind="wheel") as temp_dir:

So while I found other differences between different _yaml.cpython-36m-x86_64-linux-gnu.so, this tmp directory usage leaking in itself is sufficient to break determinism.

Additional context

rules_python issue discussing this problem: bazelbuild/rules_python#154
rules_python repo: https://github.com/bazelbuild/rules_python

@pradyunsg pradyunsg added the S: needs triage Issues/PRs that need to be triaged label May 27, 2019
@chrahunt
Copy link
Member

This string is embedded in the debug information:

$ objdump -Wi _yaml.cpython-37m-x86_64-linux-gnu.so  | grep /tmp
    <15>   DW_AT_comp_dir    : (indirect string, offset: 0x76b): /tmp/user/1000/pip-install-94g5rob6/pyyaml

It looks like this is a common issue:

It may also be related to pypa/wheel#248.

Using a deterministic build directory name would leave us open to denial-of-service and/or concurrency-related problems.

Using e.g. the GCC flag -fdebug-prefix-map=$SRC_ROOT=. is another option, but is this something that should be in pip, wheel, or the individual build backends?

@thundergolfer
Copy link
Author

thundergolfer commented Jun 18, 2019

Thanks for your input Christopher.

would leave us open to denial-of-service and/or concurrency-related problems.

The latter makes sense to me, but how exactly is DOS at play if you use a consistent build dir?

should be in pip, wheel, or the individual build backends?

For my use case, the input of the Bazel / rules_python team would be helpful in deciding. They haven't responded to my issue in rules_python yet. Might be time for a nudge.

@chrahunt
Copy link
Member

The latter makes sense to me, but who exactly is DOS at play if you use a consistent build dir?

An unprivileged user on the same host can create the build directory and set its permissions to 700, which prevents you from building.

@chrahunt chrahunt added state: needs discussion This needs some more discussion type: enhancement Improvements to functionality labels Jul 20, 2019
@triage-new-issues triage-new-issues bot removed the S: needs triage Issues/PRs that need to be triaged label Jul 20, 2019
@manthey
Copy link

manthey commented Aug 30, 2019

In a simple test, I was able to get consistent builds by exporting CFLAGS=-g0 before building the wheel. This prevents adding any of the debug information to the generated libraries which is where the TempDirectory was being pulled in. I also have SOURCE_DATE_EPOCH set. I don't know how universal this is (and, of course, you lose debugging symbols).

manthey added a commit to girder/large_image_wheels that referenced this issue Sep 2, 2019
Also, make more reproducible builds.  By default, pip injects a symbol
with its build directory into compiled files.  See
pypa/pip#6505.  This can be avoided by
preventing debug symbols by adding `CFLAGS=-g0`.  Additionally, the
wheels contain a few files with the current time stamp rather than the
time given by `SOURCE_DATE_EPOCH`, so the perl tool
`strip-nodeterminism` is used to redate the files within the wheels to
the `SOURCE_DATE_EPOCH`.
manthey added a commit to manthey/large_image_wheels that referenced this issue Nov 11, 2019
Also, make more reproducible builds.  By default, pip injects a symbol
with its build directory into compiled files.  See
pypa/pip#6505.  This can be avoided by
preventing debug symbols by adding `CFLAGS=-g0`.  Additionally, the
wheels contain a few files with the current time stamp rather than the
time given by `SOURCE_DATE_EPOCH`, so the perl tool
`strip-nodeterminism` is used to redate the files within the wheels to
the `SOURCE_DATE_EPOCH`.
@ngie-eign
Copy link

ngie-eign commented Jul 15, 2020

This is an issue not just with pip wheel, but pip install in general when not run in editable mode :(.

CC: @bdrewery

@ngie-eign
Copy link

ngie-eign commented Jul 15, 2020

Interesting... this is a common pattern that can't be overridden:

$ grep -rI TempDirectory  
pip/_internal/build_env.py:from pip._internal.utils.temp_dir import TempDirectory
pip/_internal/build_env.py:        self._temp_dir = TempDirectory(kind="build-env")
pip/_internal/cache.py:from pip._internal.utils.temp_dir import TempDirectory
pip/_internal/cache.py:        self._temp_dir = TempDirectory(kind="ephem-wheel-cache")
pip/_internal/download.py:from pip._internal.utils.temp_dir import TempDirectory
pip/_internal/download.py:    with TempDirectory(kind="unpack") as temp_dir:
pip/_internal/wheel.py:from pip._internal.utils.temp_dir import TempDirectory
pip/_internal/wheel.py:        with TempDirectory(kind="wheel") as temp_dir:
pip/_internal/commands/download.py:from pip._internal.utils.temp_dir import TempDirectory
pip/_internal/commands/download.py:            with RequirementTracker() as req_tracker, TempDirectory(
pip/_internal/commands/install.py:from pip._internal.utils.temp_dir import TempDirectory
pip/_internal/commands/install.py:        target_temp_dir = TempDirectory(kind="target")
pip/_internal/commands/install.py:            with RequirementTracker() as req_tracker, TempDirectory(
pip/_internal/commands/wheel.py:from pip._internal.utils.temp_dir import TempDirectory
pip/_internal/commands/wheel.py:            with RequirementTracker() as req_tracker, TempDirectory(
pip/_internal/req/req_install.py:from pip._internal.utils.temp_dir import TempDirectory
pip/_internal/req/req_install.py:        self._temp_build_dir = TempDirectory(kind="req-build")
pip/_internal/req/req_install.py:        with TempDirectory(kind="record") as temp_dir:
pip/_internal/req/req_tracker.py:from pip._internal.utils.temp_dir import TempDirectory
pip/_internal/req/req_tracker.py:            self._temp_dir = TempDirectory(delete=False, kind='req-tracker')
pip/_internal/req/req_uninstall.py:from pip._internal.utils.temp_dir import TempDirectory
pip/_internal/req/req_uninstall.py:        self.save_dir = TempDirectory(kind="uninstall")
pip/_internal/utils/temp_dir.py:class TempDirectory(object):
pip/_internal/utils/temp_dir.py:        super(TempDirectory, self).__init__()
pip/_internal/vcs/bazaar.py:from pip._internal.utils.temp_dir import TempDirectory
pip/_internal/vcs/bazaar.py:        with TempDirectory(kind="export") as temp_dir:
pip/_internal/vcs/git.py:from pip._internal.utils.temp_dir import TempDirectory
pip/_internal/vcs/git.py:        with TempDirectory(kind="export") as temp_dir:
pip/_internal/vcs/mercurial.py:from pip._internal.utils.temp_dir import TempDirectory
pip/_internal/vcs/mercurial.py:        with TempDirectory(kind="export") as temp_dir:

From pip.util.TempDirectory:

     60     def create(self):
     61         """Create a temporary directory and store it's path in self.path
     62         """
     63         if self.path is not None:
     64             logger.debug(
     65                 "Skipped creation of temporary directory: {}".format(self.path)
     66             )
     67             return
     68         # We realpath here because some systems have their default tmpdir
     69         # symlinked to another directory.  This tends to confuse build
     70         # scripts, so we canonicalize the path by traversing potential
     71         # symlinks here.
     72         self.path = os.path.realpath(
     73             tempfile.mkdtemp(prefix="pip-{}-".format(self.kind))
     74         )
     75         logger.debug("Created temporary directory: {}".format(self.path))

It's unfortunate, but it looks like the mkdtemp wrapper lacks the ability to override the template, which would allow us to set a deterministic build path:

$ man 3 mktemp
...
     char *
     mkdtemp(char *template);

Sidenote: hmmm... the notion of tempfiles in cpython as of 3.9 seems insecure; it bypasses better security practices by reimplementing mkstemp(3), etc, with a deterministic algorithm/template consisting of only 8 "random" characters seeded by the PID :(.

@ngie-eign
Copy link

ngie-eign commented Jul 17, 2020

I've fallen back to invoking setup.py directly for building binaries and am using pip only when installing the binaries.

It's really unfortunate that there isn't a better story around this with pip, given that pip is sort of the defacto core installation utility for python. Hopefully the Chan-Zuckerberg work will improve this support/usability, especially since Facebook used buck (alternative to bazel) by default, which does similar tricks in terms of caching artifacts on remote shares to speed up their build process.

@sbidoul
Copy link
Member

sbidoul commented Jul 17, 2020

Is this another argument in favor of in-tree builds (#7555)?

@uranusjr
Copy link
Member

It is slightly different, since building in the source tree does not necessarily mean the built artifacts are in the source tree. It is only by tradition the most popular back-end (setuptools) does this. Having in-tree builds would happen to solve the immediate problem, but IMO the ultimate solution to this problem would be to introduce a flag to PEP 517 that can tell the back-end where they must generate the artifact in, and create a flag in pip to let user provide that information.

@pradyunsg pradyunsg added the type: feature request Request for a new feature label Aug 30, 2020
josnyder-rh added a commit to josnyder-rh/pip that referenced this issue May 10, 2022
Currently, pip randomly assigns directory names when it builds Python sdists
into bdists. This can result in randomized file paths being embedded into the
build output (usually in debug symbols, but potentially in other places). The
ideal solution would be to trim the front (random part) of the file path off,
leaving the remaining (deterministic) part to embed in the binary. Doing so
would require reaching deep into the configuration of whatever compiler/linker
pip happens to be using (e.g.  gcc, clang, rustc, etc.). This option, on the
other hand, doesn't require modifying the internals of Python packages.

In this patch we make it so that pip's randomly assigned directory paths are
instead generated from a deterministic counter. Doing so requires exclusive
access to TMPDIR, because otherwise other programs (likely other executions of
`pip`) will attempt to create directories of the same name. For that reason,
the feature only activates when SOURCE_DATE_EPOCH is set.

For more discussion (and prior art) in this area, see:
 * https://github.com/NixOS/nixpkgs/pull/102222/files
 * pypa#6505
@vlad-ivanov-name
Copy link

vlad-ivanov-name commented May 13, 2022

Another piece of software that breaks because of non-deterministic paths is sccache, which requires paths to match in order to get a cache hit. Also, the DOS thing probably isn't important when builds are containerized.

@pradyunsg
Copy link
Member

So... pip no longer performs local builds in a non-deterministic location, so if users are seeing non-deterministic build outputs, it's likely not pip but instead the build-backend that the package being built is using.

@vlad-ivanov-name
Copy link

vlad-ivanov-name commented May 14, 2022

Hmm, but in practical terms that means the issue still occurs when e. g. installing packages / building wheels from git URLs

EDIT: see below

@sbidoul
Copy link
Member

sbidoul commented May 14, 2022

pip needs to download source distributions (from VCS or archives) to temporary directories because it may not even know their name in advance, and when the name is known, it may need to download and prepare metadata for different versions of the same project during the resolution process.

One thing we could imagine is moving/renaming the temporary unpack directory to a predictable location (say $TMPDIR/build/{canonical-name}) before the build step. EDIT: this is a simplistic approach that is not implementable as is at it opens the door to concurrency issues.

@vlad-ivanov-name I don't think the code you highlight above is relevant because it is merely the target directory where the built wheel must be stored, which should not be relevant to the content of the wheel.

@vlad-ivanov-name
Copy link

vlad-ivanov-name commented May 14, 2022

That makes sense, thank you.

For the purpose of CI caching, where paths being deterministic 90% of the time is good enough already, I'm considering monkey-patching tempfile.mkdtemp to use a deterministic seed instead of PID, and then invoking pip via runpy. This might fail if downloads are multi-threaded though but it doesn't seem to be the case at the moment.

One thing we could imagine is moving/renaming the temporary unpack directory to a predictable location (say $TMPDIR/build/{canonical-name}) before the build step.

I think predictable and deterministic are a bit different, again for caching having deterministic paths is enough even if the way of deriving those paths is convoluted

@pfmoore
Copy link
Member

pfmoore commented May 14, 2022

Is this not a problem for the build backend? I'm struggling to see why the problem here isn't that the backend embeds the full pathname into the output, rather than just a relative name.

@sbidoul
Copy link
Member

sbidoul commented May 14, 2022

Yes this should first be addressed in build backends and now that --config-settings is implemented all means to provide necessary options are available.

@pradyunsg
Copy link
Member

pradyunsg commented May 14, 2022

Well... At the end of the day, while building things in a reproducible manner is a valueable activity, pip is not a tool that can enforce that right now. Note that the wheels are built by a "build backend" such as https://github.com/pypa/setuptools/ or https://github.com/pypa/flit/ or https://github.com/pypa/hatch. All that pip is doing is calling them and copy-pasting their artifacts over.

Basically, I don't think the guarentee that's being requested here is something that pip itself can provide, on its own anyway. There's tooling available to build wheels in a reproducible manner, like https://github.com/kushaldas/asaman -- which uses pip under the hood and sets up everything for the relevant build-backends to build things in a reproducible manner (assuming they follow https://reproducible-builds.org/ model).

@vlad-ivanov-name
Copy link

vlad-ivanov-name commented May 14, 2022

Basically, I don't think the guarentee that's being requested here is something that pip itself can provide, on its own anyway.

That is fair; certainly, pip alone won't be able to manage all possible caveats (at the end of the day one could always put __DATE__ in source code).

However, I certainly could see pip assisting with it to some degree, for example, by providing a way to change the behaviour of the TempDirectory class to control download locations during pip install. No complicated API is necessary but something as simple as a global counter instead of "random" name would work.

I do understand that to some degree, it is the problem of build systems, cache wrappers etc; but in some cases, those have no choice but to depend on absolute paths as the mechanism of detecting whether the path would affect the output would be too complicated and unreliable to implement (example: mozilla/sccache#35).

asaman, as a wrapper for pip, could work but unfortunately doesn't support SCM URLs

josnyder-rh added a commit to josnyder-rh/pip that referenced this issue Jun 7, 2022
Currently, pip randomly assigns directory names when it builds Python sdists
into bdists. This can result in randomized file paths being embedded into the
build output (usually in debug symbols, but potentially in other places). The
ideal solution would be to trim the front (random part) of the file path off,
leaving the remaining (deterministic) part to embed in the binary. Doing so
would require reaching deep into the configuration of whatever compiler/linker
pip happens to be using (e.g.  gcc, clang, rustc, etc.). This option, on the
other hand, doesn't require modifying the internals of Python packages.

In this patch we make it so that pip's randomly assigned directory paths are
instead generated from a deterministic counter. Doing so requires exclusive
access to TMPDIR, because otherwise other programs (likely other executions of
`pip`) will attempt to create directories of the same name. For that reason,
the feature only activates when SOURCE_DATE_EPOCH is set.

For more discussion (and prior art) in this area, see:
 * https://github.com/NixOS/nixpkgs/pull/102222/files
 * pypa#6505
josnyder-rh added a commit to josnyder-rh/pip that referenced this issue Jun 7, 2022
Currently, pip randomly assigns directory names when it builds Python sdists
into bdists. This can result in randomized file paths being embedded into the
build output (usually in debug symbols, but potentially in other places). The
ideal solution would be to trim the front (random part) of the file path off,
leaving the remaining (deterministic) part to embed in the binary. Doing so
would require reaching deep into the configuration of whatever compiler/linker
pip happens to be using (e.g.  gcc, clang, rustc, etc.). This option, on the
other hand, doesn't require modifying the internals of Python packages.

In this patch we make it so that pip's randomly assigned directory paths are
instead generated from a deterministic counter. Doing so requires exclusive
access to TMPDIR, because otherwise other programs (likely other executions of
`pip`) will attempt to create directories of the same name. For that reason,
the feature only activates when SOURCE_DATE_EPOCH is set.

For more discussion (and prior art) in this area, see:
 * https://github.com/NixOS/nixpkgs/pull/102222/files
 * pypa#6505
josnyder-rh added a commit to josnyder-rh/pip that referenced this issue Jun 10, 2022
Currently, pip randomly assigns directory names when it builds Python sdists
into bdists. This can result in randomized file paths being embedded into the
build output (usually in debug symbols, but potentially in other places). The
ideal solution would be to trim the front (random part) of the file path off,
leaving the remaining (deterministic) part to embed in the binary. Doing so
would require reaching deep into the configuration of whatever compiler/linker
pip happens to be using (e.g.  gcc, clang, rustc, etc.). This option, on the
other hand, doesn't require modifying the internals of Python packages.

In this patch we make it so that pip's randomly assigned directory paths are
instead generated from a deterministic counter. Doing so requires exclusive
access to TMPDIR, because otherwise other programs (likely other executions of
`pip`) will attempt to create directories of the same name. For that reason,
the feature only activates when SOURCE_DATE_EPOCH is set.

For more discussion (and prior art) in this area, see:
 * https://github.com/NixOS/nixpkgs/pull/102222/files
 * pypa#6505
josnyder-rh added a commit to josnyder-rh/pip that referenced this issue Jun 10, 2022
Currently, pip randomly assigns directory names when it builds Python sdists
into bdists. This can result in randomized file paths being embedded into the
build output (usually in debug symbols, but potentially in other places). The
ideal solution would be to trim the front (random part) of the file path off,
leaving the remaining (deterministic) part to embed in the binary. Doing so
would require reaching deep into the configuration of whatever compiler/linker
pip happens to be using (e.g.  gcc, clang, rustc, etc.). This option, on the
other hand, doesn't require modifying the internals of Python packages.

In this patch we make it so that pip's randomly assigned directory paths are
instead generated from a deterministic counter. Doing so requires exclusive
access to TMPDIR, because otherwise other programs (likely other executions of
`pip`) will attempt to create directories of the same name. For that reason,
the feature only activates when SOURCE_DATE_EPOCH is set.

For more discussion (and prior art) in this area, see:
 * https://github.com/NixOS/nixpkgs/pull/102222/files
 * pypa#6505
SomberNight added a commit to spesmilo/electrum that referenced this issue Aug 5, 2022
vagrants-iMac:electrum vagrant$ ./contrib/osx/compare_dmg dist/electrum-4.3.0-ghost43.dmg /Users/vagrant/Desktop/electrum-4.3.0-thomas1.dmg
[...]
Extracting signatures from release app...
Created mac_extracted_sigs.tar.gz
Applying extracted signatures to unsigned app...
Done. .app with sigs applied is at: /tmp/electrum_compare_dmg/signed_app
++ diff -qr /tmp/electrum_compare_dmg/signed_app /tmp/electrum_compare_dmg/dmg2
+ diff='Files /tmp/electrum_compare_dmg/signed_app/Electrum.app/Contents/MacOS/cbor/_cbor.cpython-39-darwin.so and /tmp/electrum_compare_dmg/dmg2/Electrum.app/Contents/MacOS/cbor/_cbor.cpython-39-darwin.so differ'
+ diff='diff errored'
+ set +x
diff errored
DMGs do *not* match.
failure

user@user-VirtualBox:~/wspace/tmp$ vbindiff comp/signed_app/_cbor.cpython-39-darwin.so comp/dmg2/_cbor.cpython-39-darwin.so

comp/signed_app/_cbor.cpython-39-darwin.so
0000 6AC0: 00 5F 50 79 49 6E 69 74  5F 5F 63 62 6F 72 2E 6D  ._PyInit __cbor.m
0000 6AD0: 6F 64 65 66 00 5F 43 62  6F 72 4D 65 74 68 6F 64  odef._Cb orMethod
0000 6AE0: 73 00 2F 70 72 69 76 61  74 65 2F 76 61 72 2F 66  s./priva te/var/f
0000 6AF0: 6F 6C 64 65 72 73 2F 35  36 2F 64 38 36 70 35 39  olders/5 6/d86p59
0000 6B00: 37 31 31 67 7A 63 62 38  73 31 71 37 31 36 78 31  711gzcb8 s1q716x1
0000 6B10: 6C 63 30 30 30 30 67 6E  2F 54 2F 70 69 70 2D 69  lc0000gn /T/pip-i
0000 6B20: 6E 73 74 61 6C 6C 2D 36  6D 69 36 68 6C 75 65 2F  nstall-6 mi6hlue/
comp/dmg2/_cbor.cpython-39-darwin.so
0000 6AC0: 00 5F 50 79 49 6E 69 74  5F 5F 63 62 6F 72 2E 6D  ._PyInit __cbor.m
0000 6AD0: 6F 64 65 66 00 5F 43 62  6F 72 4D 65 74 68 6F 64  odef._Cb orMethod
0000 6AE0: 73 00 2F 70 72 69 76 61  74 65 2F 76 61 72 2F 66  s./priva te/var/f
0000 6AF0: 6F 6C 64 65 72 73 2F 37  68 2F 70 33 30 7A 5F 74  olders/7 h/p30z_t
0000 6B00: 79 31 35 30 31 32 70 66  5F 33 64 79 78 62 73 39  y15012pf _3dyxbs9
0000 6B10: 33 34 30 30 30 30 67 6E  2F 54 2F 70 69 70 2D 69  340000gn /T/pip-i
0000 6B20: 6E 73 74 61 6C 6C 2D 30  68 64 39 63 35 6D 65 2F  nstall-0 hd9c5me/

related: pypa/pip#6505
SomberNight added a commit to SomberNight/electrum that referenced this issue Aug 6, 2022
We compile from tar.gz, instead of using pre-built binary wheels from PyPI.
(or if the dep is pure-python, use tar.gz instead of "source-only" wheel)

-----
Some unorganised things below for future reference.

```
$ dsymutil -dump-debug-map dist1/hid.cpython-39-darwin.so
warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hid.o unable to open object file: No such file or directory
warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hidapi/mac/hid.o unable to open object file: No such file or directory
---
triple:          'x86_64-apple-darwin'
binary-path:     'dist1/hid.cpython-39-darwin.so'
...
```

```
$ nm -pa dist1/hid.cpython-39-darwin.so
```

- https://stackoverflow.com/questions/10044697/where-how-does-apples-gcc-store-dwarf-inside-an-executable
- pypa/pip#6505
- pypa/pip#7808 (comment)
- NixOS/nixpkgs#91272
- cython/cython#1576
- https://github.com/cython/cython/blob/9d2ba1611b28999663ab71657f4938b0ba92fe07/Cython/Compiler/ModuleNode.py#L913
@SomberNight
Copy link

SomberNight commented Aug 6, 2022

In a simple test, I was able to get consistent builds by exporting CFLAGS=-g0 before building the wheel. This prevents adding any of the debug information to the generated libraries which is where the TempDirectory was being pulled in. I also have SOURCE_DATE_EPOCH set. I don't know how universal this is (and, of course, you lose debugging symbols).

This worked for me on Linux for all packages encountered, but only for some on macOS.
In particular, it seems packages that used Cython as part of their setup.py still had some debug symbols when built on macOS.

I am atm only interested in the pip install case (as opposed to pip wheel).
Running strip -x on the build artifacts (after pip install finished) worked.

E.g.

$ find "$VENV_DIR/lib/python$PY_VER_MAJOR/site-packages/" -type f -name '*.so' -print0 \
    | xargs -0 -t strip -x

SomberNight added a commit to SomberNight/electrum that referenced this issue Aug 18, 2022
We compile from tar.gz, instead of using pre-built binary wheels from PyPI.
(or if the dep is pure-python, use tar.gz instead of "source-only" wheel)

-----
Some unorganised things below for future reference.

```
$ dsymutil -dump-debug-map dist1/hid.cpython-39-darwin.so
warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hid.o unable to open object file: No such file or directory
warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hidapi/mac/hid.o unable to open object file: No such file or directory
---
triple:          'x86_64-apple-darwin'
binary-path:     'dist1/hid.cpython-39-darwin.so'
...
```

```
$ nm -pa dist1/hid.cpython-39-darwin.so
```

- https://stackoverflow.com/questions/10044697/where-how-does-apples-gcc-store-dwarf-inside-an-executable
- pypa/pip#6505
- pypa/pip#7808 (comment)
- NixOS/nixpkgs#91272
- cython/cython#1576
- https://github.com/cython/cython/blob/9d2ba1611b28999663ab71657f4938b0ba92fe07/Cython/Compiler/ModuleNode.py#L913
SomberNight added a commit to SomberNight/electrum that referenced this issue Aug 30, 2022
We compile from tar.gz, instead of using pre-built binary wheels from PyPI.
(or if the dep is pure-python, use tar.gz instead of "source-only" wheel)

-----
Some unorganised things below for future reference.

```
$ dsymutil -dump-debug-map dist1/hid.cpython-39-darwin.so
warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hid.o unable to open object file: No such file or directory
warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hidapi/mac/hid.o unable to open object file: No such file or directory
---
triple:          'x86_64-apple-darwin'
binary-path:     'dist1/hid.cpython-39-darwin.so'
...
```

```
$ nm -pa dist1/hid.cpython-39-darwin.so
```

- https://stackoverflow.com/questions/10044697/where-how-does-apples-gcc-store-dwarf-inside-an-executable
- pypa/pip#6505
- pypa/pip#7808 (comment)
- NixOS/nixpkgs#91272
- cython/cython#1576
- https://github.com/cython/cython/blob/9d2ba1611b28999663ab71657f4938b0ba92fe07/Cython/Compiler/ModuleNode.py#L913
SomberNight added a commit to SomberNight/electrum that referenced this issue Sep 26, 2022
We compile from tar.gz, instead of using pre-built binary wheels from PyPI.
(or if the dep is pure-python, use tar.gz instead of "source-only" wheel)

-----
Some unorganised things below for future reference.

```
$ dsymutil -dump-debug-map dist1/hid.cpython-39-darwin.so
warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hid.o unable to open object file: No such file or directory
warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hidapi/mac/hid.o unable to open object file: No such file or directory
---
triple:          'x86_64-apple-darwin'
binary-path:     'dist1/hid.cpython-39-darwin.so'
...
```

```
$ nm -pa dist1/hid.cpython-39-darwin.so
```

- https://stackoverflow.com/questions/10044697/where-how-does-apples-gcc-store-dwarf-inside-an-executable
- pypa/pip#6505
- pypa/pip#7808 (comment)
- NixOS/nixpkgs#91272
- cython/cython#1576
- https://github.com/cython/cython/blob/9d2ba1611b28999663ab71657f4938b0ba92fe07/Cython/Compiler/ModuleNode.py#L913
josnyder-rh added a commit to josnyder-rh/pip that referenced this issue Oct 17, 2022
Currently, pip randomly assigns directory names when it builds Python sdists
into bdists. This can result in randomized file paths being embedded into the
build output (usually in debug symbols, but potentially in other places). The
ideal solution would be to trim the front (random part) of the file path off,
leaving the remaining (deterministic) part to embed in the binary. Doing so
would require reaching deep into the configuration of whatever compiler/linker
pip happens to be using (e.g.  gcc, clang, rustc, etc.). This option, on the
other hand, doesn't require modifying the internals of Python packages.

In this patch we make it so that pip's randomly assigned directory paths are
instead generated from a deterministic counter. Doing so requires exclusive
access to TMPDIR, because otherwise other programs (likely other executions of
`pip`) will attempt to create directories of the same name. For that reason,
the feature only activates when SOURCE_DATE_EPOCH is set.

For more discussion (and prior art) in this area, see:
 * https://github.com/NixOS/nixpkgs/pull/102222/files
 * pypa#6505
@pradyunsg
Copy link
Member

pradyunsg commented Mar 14, 2023

Given the state of the ecosystem and the devolution of build behaviours, I'm gonna close this out and say that you should help improve asaman if you want this.

@pradyunsg pradyunsg added resolution: out of scope and removed state: needs discussion This needs some more discussion labels Mar 14, 2023
@uranusjr
Copy link
Member

Linking this here for people looking for possible solutions: https://discuss.python.org/t/introducing-asaman-a-tool-to-bulid-reproducible-wheels/10932

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
resolution: out of scope type: enhancement Improvements to functionality type: feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

10 participants