Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change seed to accept unsigned long for c compatibility #6

Merged
merged 1 commit into from
May 26, 2017

Conversation

pik
Copy link
Contributor

@pik pik commented Mar 16, 2015

When parsing arguments with the i format string ParseTupleAndKeywords returns an OverflowError on Uint32 values. I will correctly accept both Int32 and Uint32 seeds.

@hajimes
Copy link
Owner

hajimes commented Jun 2, 2015

I'm grateful to you for commit. But, for two reasons, I opt for the format unit i.

  1. Security concern. The format unit I does not check overflow. In realty it may be not a problem, since a variable seed is statically allocated and the memory will be never overflown. But because C/C++ is very tricky, I'm still not sure.
  2. Interface concern. By changing the format unit i to I, I place the burden of checking inputs on users. This might cause unfavorable issues which are hard to detect in some cases, especially if the integer is overflown. For example, hash('foo', 2 ** 33) and hash('foo', 2 ** 34) return the same results without any notices.

If seeds within the range [2 ** 31, 2 ** 32) have imminent and important use cases, I will change the format unit. Until needed, though, I think it's safer to leave them untouched. It is just a design choice so your commit itself is quite reasonable. Again, I appreciate for your help.

@paulie-g
Copy link

Integer overflow and stack/heap overflow are two different things. The C standard guarantees that the storage size and alignment requirements are the same for any given signed integer type and its corresponding unsigned integer type. There can be no memory overflow here, it's not tricky at all and exactly why the C standards are useful. All integer overflow related security bugs stem from the use of integers in pointer arithmetic, eg to calculate offsets which are then used to index into arrays. None of this happens to this value and none of this will ever happen to this value.

The use case for this patch is correctness. Murmurhash3's function signature tells you it wants an unsigned int, the int you give it will be treated as such anyway. Document that the seed parameter is a 32-bit unsigned int and it is the user's responsibility to make sure he passes in a correct seed.

@dan-blanchard
Copy link

I was hoping to move some of our code from https://github.com/phensley/python-smhasher over to mmh3 and this is literally the only thing stopping me.

@hajimes hajimes merged commit c0d5d3d into hajimes:master May 26, 2017
@hajimes
Copy link
Owner

hajimes commented May 27, 2017

Because there is a growing need for the feature, I added unit tests/CI to ensure stability and merged this one. Thank you very much for your commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants