Skip to content

Commit

Permalink
Refine benchmarks section
Browse files Browse the repository at this point in the history
  • Loading branch information
hajimes committed Sep 20, 2024
1 parent e7ac2db commit 02157bf
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 23 deletions.
4 changes: 2 additions & 2 deletions paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,10 @@ @inproceedings{Broder1997a
note = {ISSN: 0818681322},
keywords = {MinHash}
}
@misc{collet_xxhash_2012,
@misc{collet_xxhash_2014,
title = {{xxHash}},
author = {Collet, Yan},
year = 2012,
year = 2014,
url = {https://github.com/Cyan4973/xxHash}
}
@misc{du_xxhash_2014,
Expand Down
41 changes: 20 additions & 21 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ authors:
orcid: 0000-0001-8542-1768
affiliation: 1
affiliations:
- name: National Institute of Informatics
- name: National Institute of Informatics, Tokyo, Japan
index: 1
date: 3 Sep 2024
bibliography: paper.bib
Expand Down Expand Up @@ -127,26 +127,31 @@ a Java-based web framework.
MurmurHash3 algorithm. Among various other Python bindings for
non-cryptographic hashes, `python-xxhash` by Yue Du [@du_xxhash_2014] is another
popular hash library, featuring xxHash developed by
Yan Collet [@collet_xxhash_2012].
Yan Collet [@collet_xxhash_2014].

# Benchmarks

Benchmarking was carefully conducted to aim the balance between accuracy,
reproducibility, and reliability, following articles on microbenchmarking
including @Peters2002, @Stinner2016, @gorelick_high_2020,
@RodriguezGuerra2021, and @Bernhardt2023.
To compare the efficiency of Python-C hash function libraries, we carefully
conducted microbenchmarking experiments, aiming to balance between accuracy,
reproducibility, and reliability. Our methodology follows established
practices from microbenchmarking literature, including works by @Peters2002,
@Stinner2016, @gorelick_high_2020, @RodriguezGuerra2021, and @Bernhardt2023.

\autoref{latency} shows the latency and \autoref{throughput} shows
\autoref{latency} shows latency, while \autoref{throughput} presents
throughput, measured as the size of hash output generated per second.
While the `xxh3` family in `python-xxhash` excels for large inputs,
the implementation of `mmh3` is more performant for smaller inputs.
as the latest version 5.0.0 leverages `METH_FASTCALL`, a new calling method
introduced in Python 3.7, to reduce the overhead of function calls.

For details, refer to the documentation of our project:
Although the `xxh3` family in `python-xxhash` demonstrates superior performance
for large inputs, the `mmh3` implementation excels with smaller inputs.
This advantage is largely due to the latest version 5.0.0,
which leverages `METH_FASTCALL`, a new calling method
introduced in Python 3.7 that reduces the overhead of function calls.
As a result, our library is particularly well-suited for use cases involving
repeated hashing of small keys—one of the common scenarios for
non-cryptographic hash functions.

For further details, refer to the documentation of the project:
<https://mmh3.readthedocs.io/en/latest/benchmark.html>.
The benchmarking results are also publicly available as JSON files in the
repository: <https://github.com/hajimes/mmh3-benchmarks>.
In addition, the benchmarking results are publicly available as JSON files in
the repository: <https://github.com/hajimes/mmh3-benchmarks>.

![Latency for small inputs \label{latency}. Lower is better.](../docs/_static/latency_small.png)

Expand All @@ -161,9 +166,3 @@ who made the first pull request to the project and later introduced the
library in her technical book [@gorelick_high_2020].

# References

The author extends sincere gratitude to xxxxx for her
helpful comments on this paper. Appreciation is also given to
all who involved in the development and maintenance of DDD. Special thanks go to
yyyy, who made the first pull request to the project and later
introduced the library in her technical book, zzzzz.

0 comments on commit 02157bf

Please sign in to comment.