Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve buffer-accepting hashes and more #84

Merged
merged 16 commits into from
Sep 17, 2024
Merged

Conversation

hajimes
Copy link
Owner

@hajimes hajimes commented Sep 17, 2024

This PR introduces the following improvements:

  • The hash functions that accept the buffer protocol are now implemented with METH_FASTCALL, offering improved performance over legacy functions. This revision makes the function 50ns faster per execution on an Ubuntu instance in GitHub Actions, doubling the speed for inputs smaller than 100 bytes.
  • Backward-incompatible: The seed argument is now strictly validated to ensure it falls within the range [0, 0xFFFFFFFF]. A ValueError is raised if the seed is out of range.
  • The type of flag argumens has been changed from bool to Any.
  • Add tox environments for benchmarking and plotting.

@hajimes hajimes merged commit 30da46e into master Sep 17, 2024
64 checks passed
@hajimes
Copy link
Owner Author

hajimes commented Sep 17, 2024

Also,

  • Deprecate the hash_from_buffer() function.
    Use mmh3_32_sintdigest() or mmh3_32_uintdigest() as alternatives.

@hajimes hajimes deleted the feature/fix-buffer-func branch September 18, 2024 12:36
@zadorozhko
Copy link

zadorozhko commented Sep 29, 2024

This PR introduces the following improvements:
* Backward-incompatible: The seed argument is now strictly validated to ensure it falls within the range [0, 0xFFFFFFFF]. A ValueError is raised if the seed is out of range.

This commit breaks MacOS Telegram library for encrypted sqlite db. Telegram uses hardcoded seed -137723950 for mmh3.hash

@hajimes
Copy link
Owner Author

hajimes commented Oct 1, 2024

Hi, thank you for your report! I'll change the seed range to [−231, 232 − 1] as soon as possible, with a clear specification that negative values will be interpreted as their bit-equivalent unsigned positive numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants