Fast non-cryptographic hash functions for Scala. This library provides APIs for computing 32-bit and 64-bit hashes.
Currently implemented hash functions
- MurmurHash3 (32-bit)
- XxHash (32-bit and 64-bit)
Hash functions in this library can be accessed via either a standard API for hashing primitives, byte arrays, or Java ByteBuffers (direct and non-direct), or a streaming API for hashing stream-like objects such as InputStreams, Java NIO Channels, or Akka Streams. Hash functions should produce consistent output regardless of platform or endianness.
This library uses the sun.misc.Unsafe
API internally. I might explore using the VarHandle
API introduced in Java 9 in the future, but am currently still supporting Java 8.
Benchmarked against various other open-source implementations
- Guava (MurmurHash3)
- LZ4 Java (XxHash32 and XxHash64 - Includes JNI binding, pure Java, and Java+Unsafe implementations)
- Scala (Scala's built-in
scala.util.hashing.MurmurHash3
) - Zero-Allocation-Hashing (XxHash64)
Benchmarks are located in the bench
subproject and can be run using the sbt-jmh plugin.
To run all benchmarks with default settings
bench/jmh:run
To run a specific benchmark with custom settings
bench/jmh:run -f 2 -wi 5 -i 5 XxHash64Bench
libraryDependencies += "com.desmondyeung.hashing" %% "scala-hashing" % "0.1.0"
This library defines the interfaces Hash32
and StreamingHash32
for computing 32-bit hashes and Hash64
and StreamingHash64
for computing 64-bit hashes. Classes extending StreamingHash32
or StreamingHash64
are not thread-safe.
The public API for Hash64
and StreamingHash64
can be seen below
trait Hash64 {
def hashByte(input: Byte, seed: Long): Long
def hashInt(input: Int, seed: Long): Long
def hashLong(input: Long, seed: Long): Long
def hashByteArray(input: Array[Byte], seed: Long): Long
def hashByteArray(input: Array[Byte], offset: Int, length: Int, seed: Long): Long
def hashByteBuffer(input: ByteBuffer, seed: Long): Long
def hashByteBuffer(input: ByteBuffer, offset: Int, length: Int, seed: Long): Long
}
trait StreamingHash64 {
def reset(): Unit
def value: Long
def updateByteArray(input: Array[Byte], offset: Int, length: Int): Unit
def updateByteBuffer(input: ByteBuffer, offset: Int, length: Int): Unit
}
Using the standard API
import com.desmondyeung.hashing.XxHash64
import java.nio.ByteBuffer
// hash a long
val hash = XxHash64.hashLong(123, seed = 0)
// hash a Array[Byte]
val hash = XxHash64.hashByteArray(Array[Byte](123), seed = 0)
// hash a ByteBuffer
val hash = XxHash64.hashByteBuffer(ByteBuffer.wrap(Array[Byte](123)), seed = 0)
Using the streaming API
import com.desmondyeung.hashing.StreamingXxHash64
import java.nio.ByteBuffer
import java.io.FileInputStream
val checksum = StreamingXxHash64(seed = 0)
val channel = new FileInputStream("/path/to/file.txt").getChannel
val chunk = ByteBuffer.allocate(1024)
var bytesRead = channel.read(chunk)
while (bytesRead > 0) {
checksum.updateByteBuffer(chunk, 0, bytesRead)
chunk.rewind
bytesRead = channel.read(chunk)
}
val hash = checksum.value
Licensed under the Apache License, Version 2.0 (the "License").