Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable hashing: add comments and tests concerning platform-independence #77319

Merged
merged 1 commit into from
Oct 1, 2020

Conversation

tgnottingham
Copy link
Contributor

@tgnottingham tgnottingham commented Sep 29, 2020

SipHasher128 implements short_write in an endian-independent way, yet
its write_xxx Hasher trait methods undo this endian-independence by byte
swapping the integer inputs on big-endian hardware. StableHasher then
adds endian-independence back by also byte-swapping on big-endian
hardware prior to invoking SipHasher128.

This double swap may have the appearance of being a no-op, but is in
fact by design. In particular, we really do want SipHasher128 to be
platform-dependent, in order to be consistent with the libstd SipHasher.
Try to clarify this intent. Also, add and update a couple of unit tests.


Previous commit text:

SipHasher128: fix platform-independence confusion

StableHasher is supposed to ensure platform independence by converting
integers to little-endian and extending isize and usize to 64 bits as
necessary, but in fact, much of that work is already handled by
SipHasher128.

In particular, SipHasher128 implements short_write in an
endian-independent way, yet both StableHasher and SipHasher128
additionally attempt to achieve endian-independence by byte swapping on
BE hardware before invoking short writes. This double swap has no
effect, so let's remove it.

Because short_write is endian-independent, SipHasher128 is already
handling part of the platform-independence, and it would be somewhat
difficult to make it not handle that part with the current
implementation. As splitting platform-independence responsibilities
between StableHasher and SipHasher128 would be confusing, let's make
SipHasher128 handle all of it.

Finally, update some incorrect comments and increase test coverage.
Unit tests pass on both LE and BE systems.

@rust-highfive
Copy link
Collaborator

r? @davidtwco

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 29, 2020
@tgnottingham
Copy link
Contributor Author

This is just a cleanup, by the way. Should be a non-functional change.

Copy link
Member

@davidtwco davidtwco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'm not familiar with this code.

@davidtwco
Copy link
Member

r? @nagisa

@rust-highfive rust-highfive assigned nagisa and unassigned davidtwco Sep 29, 2020
@nagisa
Copy link
Member

nagisa commented Sep 29, 2020

Overall LGTM, r=me, with or without additional tests added (although if they are not added, maybe file a task for somebody else?)

@nagisa
Copy link
Member

nagisa commented Sep 29, 2020

@bors r+

@bors
Copy link
Contributor

bors commented Sep 29, 2020

📌 Commit eb0a88f has been approved by nagisa

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 29, 2020
@tgnottingham
Copy link
Contributor Author

Thanks @nagisa. I realize now that there might not be any process like CI running unit tests against big-endian systems, which would mean that the added tests don't yet provide the value they promise. Would the best course of action be to add a BE system to CI?

@nagisa
Copy link
Member

nagisa commented Sep 29, 2020

It is a long standing problem that may be eventually figured out in parallel. There are some third parties (such as Debian) that might eventually run these tests, so I think the tests as they are now are good enough.

@tgnottingham
Copy link
Contributor Author

CI failed due to what seems to be a spurious network issue. Not sure if it needs to be kicked off again.

@tgnottingham
Copy link
Contributor Author

Saw that CI failed in #77350 on 32-bit systems due to some tests that needed updating (test_hash_usize in sip128/tests.rs). Pushed a change that I believe should resolve it.

@tgnottingham
Copy link
Contributor Author

I just stumbled upon this comment, which explains that the double swapping that my PR removes was actually intentional. So I think this PR shouldn't merge after all, at least as it is now.

What I intend to do is to roll back some of the changes, but make some improvements to the comments so that it is clear why things have been done the way they've been done. @nnethercote and @michaelwoerister, please chime in if you have any input.

Thanks for the patience with the back and forth here. :)

@nnethercote
Copy link
Contributor

The description says "Unit tests pass on both LE and BE systems" but subsequent comments suggest that statement might be wrong?

I was very careful in #68914 to get everything correct for LE and BE and as far as I know the current code is correct. There was a ton of back-and-forth in that PR and Michael did various tests on BE machines. The code is confusing, and I added comments to make things clearer but there is probably room for further improvements there.

(It is unfortunate that there is no BE testing on CI. The current state of the art is to request an account on the GCC compile farm and, once you've got that, test on one of their BE machines. Horrible, I know.)

In summary: I would not be satisfied with the proposed changes in this PR unless you can show the current code does the wrong thing on BE machines, and I think you don't have that yet. But let me know if I've got that wrong.

@tgnottingham
Copy link
Contributor Author

tgnottingham commented Sep 30, 2020

The description says "Unit tests pass on both LE and BE systems" but subsequent comments suggest that statement might be wrong?

The issue wasn't LE vs BE, but 32 vs 64 bit. I had tested on x86-64 and 32-bit big-endian MIPS. However, the test_hash_usize function doesn't have a variant that runs on the latter, so I hadn't been running that test against a 32-bit system, and the issue slipped by.

In summary: I would not be satisfied with the proposed changes in this PR unless you can show the current code does the wrong thing on BE machines, and I think you don't have that yet. But let me know if I've got that wrong.

Oh no, the current code does the right thing on BE. I was just having a difficult time understanding the structure of the code, and thought that this would be clearer. But now I see from the other thread why things were done this way. My intent is to roll back most of my changes and just add some additional comments.

SipHasher128 implements short_write in an endian-independent way, yet
its write_xxx Hasher trait methods undo this endian-independence by byte
swapping the integer inputs on big-endian hardware. StableHasher then
adds endian-independence back by also byte-swapping on big-endian
hardware prior to invoking SipHasher128.

This double swap may have the appearance of being a no-op, but is in
fact by design. In particular, we really do want SipHasher128 to be
platform-dependent, in order to be consistent with the libstd SipHasher.
Try to clarify this intent. Also, add and update a couple of unit tests.
@tgnottingham tgnottingham changed the title SipHasher128: fix platform-independence confusion Stable hashing: add comments and tests concerning platform-independence Sep 30, 2020
@tgnottingham
Copy link
Contributor Author

I've made revisions and updated the PR title and description accordingly. I hope this is an improvement. :)

@nnethercote
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Sep 30, 2020

📌 Commit d061fee has been approved by nnethercote

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Sep 30, 2020
…r=nnethercote

Stable hashing: add comments and tests concerning platform-independence

SipHasher128 implements short_write in an endian-independent way, yet
its write_xxx Hasher trait methods undo this endian-independence by byte
swapping the integer inputs on big-endian hardware. StableHasher then
adds endian-independence back by also byte-swapping on big-endian
hardware prior to invoking SipHasher128.

This double swap may have the appearance of being a no-op, but is in
fact by design. In particular, we really do want SipHasher128 to be
platform-dependent, in order to be consistent with the libstd SipHasher.
Try to clarify this intent. Also, add and update a couple of unit tests.

---

Previous commit text:

~SipHasher128: fix platform-independence confusion~

~StableHasher is supposed to ensure platform independence by converting
integers to little-endian and extending isize and usize to 64 bits as
necessary, but in fact, much of that work is already handled by
SipHasher128.~

~In particular, SipHasher128 implements short_write in an
endian-independent way, yet both StableHasher and SipHasher128
additionally attempt to achieve endian-independence by byte swapping on
BE hardware before invoking short writes. This double swap has no
effect, so let's remove it.~

~Because short_write is endian-independent, SipHasher128 is already
handling part of the platform-independence, and it would be somewhat
difficult to make it *not* handle that part with the current
implementation. As splitting platform-independence responsibilities
between StableHasher and SipHasher128 would be confusing, let's make
SipHasher128 handle all of it.~

~Finally, update some incorrect comments and increase test coverage.
Unit tests pass on both LE and BE systems.~
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 1, 2020
Rollup of 12 pull requests

Successful merges:

 - rust-lang#76909 (Add Iterator::advance_by and DoubleEndedIterator::advance_back_by)
 - rust-lang#77153 (Fix recursive nonterminal expansion during pretty-print/reparse check)
 - rust-lang#77202 (Defer Apple SDKROOT detection to link time.)
 - rust-lang#77303 (const evaluatable: improve `TooGeneric` handling)
 - rust-lang#77305 (move candidate_from_obligation_no_cache)
 - rust-lang#77315 (Rename AllocErr to AllocError)
 - rust-lang#77319 (Stable hashing: add comments and tests concerning platform-independence)
 - rust-lang#77324 (Don't fire `const_item_mutation` lint on writes through a pointer)
 - rust-lang#77343 (Validate `rustc_args_required_const`)
 - rust-lang#77349 (Update cargo)
 - rust-lang#77360 (References to ZSTs may be at arbitrary aligned addresses)
 - rust-lang#77371 (Remove trailing space in error message)

Failed merges:

r? `@ghost`
@bors bors merged commit 0044a9c into rust-lang:master Oct 1, 2020
@rustbot rustbot added this to the 1.48.0 milestone Oct 1, 2020
@tgnottingham tgnottingham deleted the siphasher_endianness branch January 20, 2021 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants