Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bdist_wheel is not idempotent #248

Closed
dkamm opened this issue Aug 22, 2018 · 17 comments
Closed

bdist_wheel is not idempotent #248

dkamm opened this issue Aug 22, 2018 · 17 comments
Labels

Comments

@dkamm
Copy link

dkamm commented Aug 22, 2018

If I run python setup.py build bdist_wheel twice without changing any files, the resulting wheels have binary differences. This seems a bit unexpected given that other archive tools like tar are idempotent. Is there a reason for this? I tested on python 2 and 3.

@Code0x58
Copy link

Code0x58 commented Sep 7, 2018

This will probably be because of differing timestamps in the resulting zip archive/wheel. I'd be happy to submit a PR to make the resulting archive have a constant timestamp (1980-01-01 00:00:00 because MS-DOS date times).

There's probably two approaches to pick from depending on the codebase (I'll check later):

  • use zipfile.ZipInfo and Zipfile.writestr(…) if zipfile is being used, or use it as a poor man's solution if not
  • use code like python-stripzip to operation on a raw zip file

As a hopefully short-term solution™ you could use python-stripzip which I made last night and will continue to improve with proper CLI arguments, exception catching, and testing.

@Code0x58
Copy link

Code0x58 commented Sep 7, 2018

from wheel.wheelfile (mentions #143):

def get_zipinfo_datetime(timestamp=None):
    # Some applications need reproducible .whl files, but they can't do this without forcing
    # the timestamp of the individual ZipInfo objects. See issue #143.
    timestamp = int(os.environ.get('SOURCE_DATE_EPOCH', timestamp or time.time()))
    return time.gmtime(timestamp)[0:6]

for existing files, the original modification time is used (so low stability, as they are clobbered by git), and the /RECORD entry in the archive is given the current timestamp. Alternatively, if SOURCE_DATE_EPOCH is specified in the environment, then it will be used as the modification time for all files - so I think you could use that at the moment and have no extra dependencies.

I'm wondering how relevant modification times are in the use of wheels; should the timestamps in the generated archives always be a fixed date? It sounds like the Debian folks are using SOURCE_DATE_EPOCH in their builds (not sure if they'd prefer that over always using the same stamp). For people using git, the modification times aren't practically meaningful.

I'm inclined to make a PR for:

def get_zipinfo_datetime(timestamp=None):
    # assuming timestamps are irrelevant, so set all dates to the lowest available in zips (_1980-01-01 00:00:00_)
    timestamp = int(os.environ.get('SOURCE_DATE_EPOCH', 315532800))
    return time.gmtime(timestamp)[0:6]

@Code0x58
Copy link

Code0x58 commented Sep 9, 2018

One thing that makes things more likely to move around when building is the WHEEL in the packed *.dist-info directory which is something like:

Wheel-Version: 1.0
Generator: bdist_wheel (0.31.1)
Root-Is-Purelib: true
Tag: py2-none-any
Tag: py3-none-any

So something to consider when making builds deterministic

@dholth
Copy link
Member

dholth commented Sep 14, 2018

"reproducible builds" means if you have the same versions of tools then you will get the same output. The generator version is a feature.

@agronholm
Copy link
Contributor

What disturbs me more is the fact that the file permissions in the resulting wheel depend on the current umask. In my opinion the permissions should be the same regardless of which OS the wheel was built on (and consequently, the umask).

@agronholm
Copy link
Contributor

@dholth I'm thinking of using 0o644 permissions for all files regardless of their permissions in the build directory. Any objections?

@agronholm
Copy link
Contributor

agronholm commented Oct 3, 2018

I'm also not opposed to setting the timestamp to 1980-01-01 00:00:00 for all files.

@kushaldas
Copy link

Even with fixed SOURCE_DATE_EPOCH, the native binary wheels have different sha256sums. For example, one can try to build cryptography module.

@agronholm
Copy link
Contributor

@kushaldas which files inside the wheel differ then?

@agronholm
Copy link
Contributor

Does the build process recreate any files?

@kushaldas
Copy link

@agronholm I found the following differences.

user@deb-build:~/wheels/b$ sha256sum ../a/cryptography/hazmat/bindings/*
94bd4a00f390c3fcc39caed9815d902bc6c4b35064d63b372e7b14afe0b6bc4a  ../a/cryptography/hazmat/bindings/_constant_time.abi3.so
d301b0d8e17d47b7c75fb35610d0a6aec62281b5c7536a23827f8de11923b3d5  ../a/cryptography/hazmat/bindings/__init__.py
sha256sum: ../a/cryptography/hazmat/bindings/openssl: Is a directory
45d63b578dae43051945485b74167f2afcbc8fc45e6773992fe0ee272184026b  ../a/cryptography/hazmat/bindings/_openssl.abi3.so
589853ff8c4185d4d1ba63a1ebb7ff3587f9ba84ac9f0649da46cf18c57a09a3  ../a/cryptography/hazmat/bindings/_padding.abi3.so
user@deb-build:~/wheels/b$ sha256sum cryptography/hazmat/bindings/*
d679f2c687c4b89ec538975ab761fcf6d46d2372c07d7aa215e1693d2695ca20  cryptography/hazmat/bindings/_constant_time.abi3.so
d301b0d8e17d47b7c75fb35610d0a6aec62281b5c7536a23827f8de11923b3d5  cryptography/hazmat/bindings/__init__.py
sha256sum: cryptography/hazmat/bindings/openssl: Is a directory
77047f41512632efb398aa084c5d8d164ab99f42cc00bbb47e4ae1294a9454cd  cryptography/hazmat/bindings/_openssl.abi3.so
89da3ffe6c7adf6eabe1dc687a61e4faae7b8bc081d633b2a2d7db9618de11a8  cryptography/hazmat/bindings/_padding.abi3.so

And also:

user@deb-build:~/wheels/b$ sha256sum ../a/cryptography-2.3.1.dist-info/*
cea9826495ebbcb676a6346d807dc37ce4e9149b81970ce3c4d91e0b2aa949ff  ../a/cryptography-2.3.1.dist-info/DESCRIPTION.rst
15a075408f5ef28e638e423f7ef32b929ab123b03b485cd4d57498b387caaafe  ../a/cryptography-2.3.1.dist-info/METADATA
5052bdfdc5b5b1699febcd7f23252e7773095337ff984a3cb98557bf434ef619  ../a/cryptography-2.3.1.dist-info/metadata.json
6e9a132666a8fefa90bea863196827aeb56cb98dfca22dce2eaa57fa745d9d27  ../a/cryptography-2.3.1.dist-info/RECORD
402918404e07241a6a22bf9a06a6ce67bd0d95f6de8ca9c313a3836cd814c308  ../a/cryptography-2.3.1.dist-info/top_level.txt
cdaed68e6d5b9443d9af071f2499b90253555cec924fd54a954bfd82ceeb1c2f  ../a/cryptography-2.3.1.dist-info/WHEEL
user@deb-build:~/wheels/b$ sha256sum cryptography-2.3.1.dist-info/*
cea9826495ebbcb676a6346d807dc37ce4e9149b81970ce3c4d91e0b2aa949ff  cryptography-2.3.1.dist-info/DESCRIPTION.rst
5dc087eefd81d3ed5e635efe921cd03e89827b82465b7fa77c5ac611fb9cd3a5  cryptography-2.3.1.dist-info/METADATA
5052bdfdc5b5b1699febcd7f23252e7773095337ff984a3cb98557bf434ef619  cryptography-2.3.1.dist-info/metadata.json
dd5bcf47864dfc856488f04c074dec1d4c04c804e5eaa0a019fd4466dae607cb  cryptography-2.3.1.dist-info/RECORD
402918404e07241a6a22bf9a06a6ce67bd0d95f6de8ca9c313a3836cd814c308  cryptography-2.3.1.dist-info/top_level.txt
cdaed68e6d5b9443d9af071f2499b90253555cec924fd54a954bfd82ceeb1c2f  cryptography-2.3.1.dist-info/WHEEL

The .so files are showing different checksums, and also the RECORD (this is due to .so files again) and also the METADATA files.

@agronholm
Copy link
Contributor

@kushaldas which wheel version did you try with? Wheel has not generated metadata.json for a while.

@agronholm
Copy link
Contributor

Regarding the differences in the .so files, what would you like me to do about it?

@agronholm
Copy link
Contributor

@dkamm I will close this issue as invalid if you don't respond within a week. Can you verify that there is something that wheel does wrong here? Like, is it bad that it runs the bdist command every time a new wheel is created?

@dkamm
Copy link
Author

dkamm commented Oct 18, 2018

@agronholm feel free to close as invalid. It ended up not mattering for me. Sorry to send you guys down this goose chase

@kushaldas
Copy link

@kushaldas which wheel version did you try with? Wheel has not generated metadata.json for a while.

Now I tried with wheel-0.32.2 on updated Debian Stretch. The commands I used:

export SOURCE_DATE_EPOCH=1540393768
pip3 wheel --no-index --find-links ./localwheels/ -w ./localwheels/ -r requirements-build.txt

After this, the only difference is in the binary .so files.

user@wheelsfun:~/test2$ sha256sum cryptography/hazmat/bindings/*.so
fa06528b92af57233a11e0b853d757416ac05935c8727a44144851de8f3f7a13  cryptography/hazmat/bindings/_constant_time.abi3.so
57579dd085a4d84cbfdf00024a817f8d7379047b252c907a8105fe05396bd8ee  cryptography/hazmat/bindings/_openssl.abi3.so
d9a64e837143380f193493887d88caa6fb048a983f02025586afe7a15575bb16  cryptography/hazmat/bindings/_padding.abi3.so
user@wheelsfun:~/test2$ sha256sum ../test1/cryptography/hazmat/bindings/*.so
114093a8d727c676c32932cf7b54b6da0b944706d9d9a25941f4ee355fd97868  ../test1/cryptography/hazmat/bindings/_constant_time.abi3.so
a74aeaa1f1af759216bd8bafa3c381ba90a517ddb8d705a0637aa8c54a30beef  ../test1/cryptography/hazmat/bindings/_openssl.abi3.so
13c4cb563b204c1b4af6505d66d2718ee75a25d11fe1d014c99e64fdd0f315af  ../test1/cryptography/hazmat/bindings/_padding.abi3.so

Regarding the differences in the .so files, what would you like me to do about it?

Any tips on how to make sure that we can have reproducible binary wheels?

@agronholm
Copy link
Contributor

How to compile C code in a reproducible manner is outside the scope of my experience. This topic would be better addressed by talking about it on the distutils-sig list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants