[GR-52534] Change digest algorithm and encoding. #8772
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Digests are used in multiple places in the image generator, from lambda names to factory method names to symbol names. Shorter digests, and shorter "unique short names", therefore mean smaller image size.
This PR changes the digest from SHA-1 encoded as a hex string (40 bytes) to 128-bit Murmur3 as a Base-62 string (22 bytes). We do not use a standard Base64 encoding because Base64 needs 2 special characters in addition to numbers and letters, and there are no 2 characters that work universally everywhere (Java names, symbol names, ...). We also don't care about encoding speed and never need to decode, so a non-standard encoding does not matter at all.
There is no need for the hash algorithm to be cryptographic. In the worst case, image build can fail because two symbol names are no longer unique. For many usages, there is already a conflict resolution policy in place because the digested input strings themselves are not guaranteed to be unique. A non-cryptographic algorithm also avoids false-positive flagging of the old SHA-1 as a weak cryptographic algorithm without moving to an even longer SHA-2 hash.
In addition, there are a few other tweaks that shorten "unique short names", like capping the length of class names, removing the unnecessary
constructor
literal for constructors, greatly shortening the deoptimization entry point marker, ...