Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locally built whl files are not reproducible #154

Closed
ltekieli opened this issue Jan 8, 2019 · 11 comments
Closed

Locally built whl files are not reproducible #154

ltekieli opened this issue Jan 8, 2019 · 11 comments

Comments

@ltekieli
Copy link

ltekieli commented Jan 8, 2019

Pip uses a temporary directory as the build directory for system specific libraries (eg. psutil), this information gets transferred to the resulting *.so files which causes each call of pip to produce a different *.so file.

The following repository illustrates the issue https://github.com/ltekieli/rules_python_bug.
It uses --disk_cache to specify a common cache between runs. Each run is done in a new docker container instance.

Buggy output is:

$ ./test.sh 
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Invocation ID: 4f84ccf8-7744-4871-abcd-2b1ef9209fbb
INFO: Rule 'io_bazel_rules_python' modified arguments {"shallow_since": "1546820050 -0500"}
INFO: Analysed target //:test_one (31 packages loaded, 589 targets configured).
INFO: Found 1 test target...
Target //:test_one up-to-date:
  bazel-bin/test_one
INFO: Elapsed time: 53.267s, Critical Path: 1.28s
INFO: 1 process: 1 processwrapper-sandbox.
INFO: Build completed successfully, 5 total actions
//:test_one                                                              PASSED in 0.9s

Executed 1 out of 1 test: 1 test passes.
INFO: Build completed successfully, 5 total actions
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Invocation ID: f615b1f7-d36c-4f30-96fd-c4fb36d6f4e3
INFO: Rule 'io_bazel_rules_python' modified arguments {"shallow_since": "1546820050 -0500"}
INFO: Analysed target //:test_one (31 packages loaded, 590 targets configured).
INFO: Found 1 test target...
Target //:test_one up-to-date:
  bazel-bin/test_one
INFO: Elapsed time: 53.548s, Critical Path: 1.05s
INFO: 1 process: 1 processwrapper-sandbox.
INFO: Build completed successfully, 5 total actions
//:test_one                                                              PASSED in 0.7s

In the second run the test result should be taken from the cache, but due to the nondeterministic whl it is not.

Applying the following patch solves this issue and the test is properly taken from the cache.

diff --git a/rules_python/piptool.py b/rules_python/piptool.py
index f5d504a..eb688dc 100644
--- a/rules_python/piptool.py
+++ b/rules_python/piptool.py
@@ -154,7 +154,7 @@ def main():
   args = parser.parse_args()
 
   # https://github.com/pypa/pip/blob/9.0.1/pip/__init__.py#L209
-  if pip_main(["wheel", "-w", args.directory, "-r", args.input]):
+  if pip_main(["wheel", "-b", "/tmp/pip-build", "-w", args.directory, "-r", args.input]):
     sys.exit(1)
 
   # Enumerate the .whl files we downloaded.

And correct output:

./test.sh 
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Invocation ID: 4fe44acd-92dc-4063-8c05-f5639da00909
INFO: Rule 'io_bazel_rules_python' modified arguments {"shallow_since": "1546883993 +0100"}
INFO: Analysed target //:test_one (31 packages loaded, 591 targets configured).
INFO: Found 1 test target...
Target //:test_one up-to-date:
  bazel-bin/test_one
INFO: Elapsed time: 51.196s, Critical Path: 1.06s
INFO: 1 process: 1 processwrapper-sandbox.
INFO: Build completed successfully, 5 total actions
//:test_one                                                              PASSED in 0.7s

Executed 1 out of 1 test: 1 test passes.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these INFO: Build completed successfully, 5 total actions
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Invocation ID: 961fce0b-6e99-4b32-a6a8-1d5ca9640e2c
INFO: Rule 'io_bazel_rules_python' modified arguments {"shallow_since": "1546883993 +0100"}
INFO: Analysed target //:test_one (31 packages loaded, 589 targets configured).
INFO: Found 1 test target...
Target //:test_one up-to-date:
  bazel-bin/test_one
INFO: Elapsed time: 47.996s, Critical Path: 0.26s
INFO: 1 process: 1 remote cache hit.
INFO: Build completed successfully, 5 total actions
//:test_one                                                     (cached) PASSED in 0.2s

Executed 0 out of 1 test: 1 test passes.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these INFO: Build completed successfully, 5 total actions

The problem with this solution is that it might happen that parallel executions of pip write to the same build directory when tmp is not sandboxed by bazel. I'm not sure how to solve this properly yet.

@PoncinMatthieu
Copy link

I have just now encountered the same issue while trying to get our python test to work with the remote cache. I wrote all the info in this thread: https://groups.google.com/forum/#!topic/bazel-sig-python/0leDk_aD8FM

Fixing this would be great, but in the case of the PyYAML module it is apparently not enough to fix the caching.
Additionally, trying to build without the compiled library (with option --global-option="--without-libyaml" ) doesn't seem to be supported by the python rules.

@ltekieli
Copy link
Author

ltekieli commented Jan 9, 2019

@PoncinMatthieu i tried to reproduce it with pyyaml but it seems to work, maybe some sort of a timestamp in your case?

Anyhow, regarding the solution to the build dir problem, I also tried to use "-b", os.getcwd(), and it seems to work correctly. os.getcwd() points to the place where the resulting wheels are stored. Might that be a solution for our problem here?

@PoncinMatthieu
Copy link

@ltekieli Thanks a lot for trying it out with pyyaml!
I now tried it on linux and the .so file is indeed the same now with the -b option :)
On my local machine (macos), the lib differs everytime I reinstall it with the same options, I will try to debug this more on macos, but this is very good news that it works on linux.

@PoncinMatthieu
Copy link

I have noticed that for some packages like "apache-airflow", we also need to upgrade pip tools to the latest versions to get deterministic builds. I believe this is what we are missing with the current version of pip: https://bitbucket.org/pypa/wheel/pull-requests/47/make-metadata-generation-deterministic/diff

@PoncinMatthieu
Copy link

Lastly, I found out that some specific modules are not reproducible due to a setting "optimize" which includes compiled files to the build. This can be tested on the same repo by adding the requirement suds-jurko==0.6.
The optimize flag being set here: https://bitbucket.org/jurko/suds/src/94664ddd46a61d06862fa8fb6ba7b9e054214f57/setup.cfg?at=default&fileviewer=file-view-default#setup.cfg-31
So far I didn't find a way to disable this other than to fork the repo and disable it in the source.
I would be very interested to know if there is any way to disable this from the rules_python or pip configurations!

@Globegitter
Copy link

One thing I also noticed is that setting an env var SOURCE_DATE_EPOCH is the last thing I needed to get wheel reproducible for me. See: pypa/wheel@1b879e5

thundergolfer referenced this issue in thundergolfer-forks/rules_python May 14, 2019
@thundergolfer
Copy link
Collaborator

thundergolfer commented May 14, 2019

I'm debugging this same issue and unfortunately even having -d set to a consistent location and SOURCE_DATE_EPOCH set the resulting .so files are still not the same.

I can do two bazel clean --expunge && bazel build //my_python_target and PyYAML will have two different .so files and thus it won't cache.

Running strings on the different .so files reveals this kind of thing appears in the resulting binaries:

/tmp/pip-wheel-h_k2kp_h/pyyaml

I'd be interested to know where that tmp dir is being set.


Note: We're using --python_path=$(which python3.6)

@thundergolfer
Copy link
Collaborator

I'd be interested to know where that tmp dir is being set.

It's coming from here:

https://github.com/pypa/pip/blob/6af9de97bbd2427f82942e476c590a2db22ea1ff/src/pip/_internal/wheel.py#L649

@thundergolfer
Copy link
Collaborator

@brandjon What do you think about making this a P1? Deterministic builds for caching is pretty central to Bazel's value proposition. Unless there's a known mitigation for this problem?

@dillon-giacoppo
Copy link
Contributor

dillon-giacoppo commented Oct 28, 2019

I think its possible to get the tests caching by tweaking some environment variables. https://github.com/ltekieli/rules_python_bug

Added the environment variables:

./test.sh
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Invocation ID: 71e03b02-1750-4943-9f09-6bd066444d7f
INFO: Rule 'io_bazel_rules_python' modified arguments {"shallow_since": "1546820050 -0500"}
INFO: Analysed target //:test_one (37 packages loaded, 581 targets configured).
INFO: Found 1 test target...
Target //:test_one up-to-date:
  bazel-bin/test_one
INFO: Elapsed time: 19.140s, Critical Path: 0.76s
INFO: 1 process: 1 processwrapper-sandbox.
INFO: Build completed successfully, 5 total actions
//:test_one                                                              PASSED in 0.6s

Executed 1 out of 1 test: 1 test passes.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option INFO: Build completed successfully, 5 total actions
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Invocation ID: b5b596e4-ff8b-496b-82d8-5e3ed963c5db
INFO: Rule 'io_bazel_rules_python' modified arguments {"shallow_since": "1546820050 -0500"}
INFO: Analysed target //:test_one (37 packages loaded, 582 targets configured).
INFO: Found 1 test target...
Target //:test_one up-to-date:
  bazel-bin/test_one
INFO: Elapsed time: 20.645s, Critical Path: 0.15s
INFO: 1 process: 1 remote cache hit.
INFO: Build completed successfully, 5 total actions
//:test_one                                                     (cached) PASSED in 0.1s

Executed 0 out of 1 test: 1 test passes.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option INFO: Build completed successfully, 5 total actions

@alexeagle
Copy link
Collaborator

rules_python 0.1.0 has been released which upstreams the rules_python_external repo. Please switch from pip_import to pip_install which doesn't have this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants