Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip/fix file attributes/encodings? #277

Closed
mdedetrich opened this issue Aug 11, 2023 · 10 comments · Fixed by #298
Closed

Strip/fix file attributes/encodings? #277

mdedetrich opened this issue Aug 11, 2023 · 10 comments · Fixed by #298
Assignees

Comments

@mdedetrich
Copy link
Collaborator

mdedetrich commented Aug 11, 2023

When authoring a release for pekko-management we found an issue where sbt-reproducible builds was reporting a mismatch on the checksums of the jars. Upon further inspect it appears that the problem is due to different file encodings for additional files that are added in META-INF (i.e. NOTICE/DISCLAIMER for Apache projects). You can look at https://gist.github.com/mdedetrich/172c369af2a4ac39e2b21f3ad8e8daf6 which contains the diffscope of the difference, i.e.

│    apparent file type:                             binary
│    non-MSDOS external file attributes:             000000 hex
│    MS-DOS file attributes (00 hex):                none
│
│    The central-directory extra field contains:
│    - A subfield with ID 0xcafe (unknown) and 0 data bytes.
│    - A subfield with ID 0x000a (PKWARE Win32) and 32 data bytes.  The first
│ -    20 are:   00 00 00 00 01 00 18 00 00 d8 a8 c3 e2 53 bf 01 00 d8 a8 c3.
│ +    20 are:   00 00 00 00 01 00 18 00 00 40 6d 25 eb 53 bf 01 00 40 6d 25.

You can see the full reproducibleBuildsCheck report at https://gist.github.com/mdedetrich/08e02b787d4f712d104fba1d47252864

Note that the only differences are in these added files to META-INF, the actual jar bytecode is exactly the same (can be verified with jardiff, in my case its empty).

Does it make sense for sbt-reproducible-builds to have an option which "sanitises" extra files that are added to META-INF so that it enforces a consistent file encoding? @raboof wdyt?

@mdedetrich
Copy link
Collaborator Author

pekko-grpc is another project that has the same issue, specifically https://github.com/apache/incubator-pekko-grpc/releases/tag/v1.0.0-RC2

@pjfanning
Copy link

Relates to apache/pekko-http#457

@raboof
Copy link
Owner

raboof commented Jan 29, 2024

Does it make sense for sbt-reproducible-builds to have an option which "sanitises" extra files that are added to META-INF so that it enforces a consistent file encoding? @raboof wdyt?

Possibly, but I'd like to first do some more analysis of where those different (seems timezone-dependent?) timestamp attributes come from, and if possible fix this at the root.

@mdedetrich
Copy link
Collaborator Author

The timezone makes definite sense since sbt-reproducible-builds started failing on timezone changes when @pjfanning was doing releases

I don't see how sbt-osgi is related considering its only used for pekko core and not pekko-http and from memory this issue was occurring on every single pekko module.

It seems to be a way that the jar's are packaged, sbt-osgi also now has a setting to control how it packages the final jar but as I said I don't think its relevant

@mdedetrich
Copy link
Collaborator Author

Another option might be to use jardiff which is now published as a library on maven. jardiff is what I use instead of sbt-reproducible-builds as it checks the contents of the jar itself rather than the the hash of the entire jar.

I don't know how acceptable this is in terms of pure reproducibility but I guess its an option 🤷

@raboof
Copy link
Owner

raboof commented Jan 29, 2024

'acceptable' depends on context, but it definitely isn't a substitute for binary reproducibility: you have to essentially trust jardiff to not be susceptible to missing changes that are somehow hidden. Also, it means rebuilding any container (archive, image, etc) that contains Pekko fully from source now also isn't reproducible anymore.

@raboof raboof self-assigned this Jan 29, 2024
@mdedetrich
Copy link
Collaborator Author

Agreed, hence the "acceptable" part 😄 I was just suggesting an alternative incase we can't solve the root issue.

@raboof
Copy link
Owner

raboof commented Jan 30, 2024

I think the spurious 0x000a field is caused by apache/commons-compress#472 .

It might still make sense to strip this field type, but I would prefer postponing that until we encounter a case where it is not feasible to fix the upstream root cause instead.

@pjfanning
Copy link

Would it be possible to have sbt-reproducible-builds support an optional config setting that allows users to override the date/time set on the files in the jars? This could be a hardcoded date/time set in the build.sbt before a release. It could be date/time close to when the release is expected. I don't know if using a 2024 date would get around the 1980 date issue in apache/commons-compress#472 but regardless, I think it could be a useful feature.

@raboof
Copy link
Owner

raboof commented Jan 31, 2024

Yes, that might be worth considering - similar to SOURCE_DATE_EPOCH or Maven's project.build.outputTimestamp.

raboof added a commit that referenced this issue Feb 18, 2024
Should fix #277 as this now strips those 0x000a sections entirely.
@raboof raboof mentioned this issue Feb 18, 2024
raboof added a commit that referenced this issue Feb 18, 2024
Should fix #277 as this now strips those 0x000a sections entirely.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants