-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use MurmurHash3 to allow for fast in-memory hashing with no conversion #26
Conversation
Base.firstindex(::ShortString) = 1 | ||
Base.isvalid(s::ShortString, i::Integer) = isvalid(String(s), i) | ||
Base.iterate(s::ShortString) = iterate(String(s)) | ||
Base.iterate(s::ShortString, i::Integer) = iterate(String(s), i) | ||
Base.lastindex(s::ShortString) = sizeof(s) | ||
Base.ncodeunits(s::ShortString) = sizeof(s) | ||
|
||
Base.display(s::ShortString) = display(String(s)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't show
enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just moved that, to organize them a bit better, but I think you are right.
src/hash.jl
Outdated
|
||
Base.hash(x::ShortString, h::UInt) = hash(String(x), h) | ||
Base.hash(x::ShortString, h::UInt) = | ||
last(mmhash128_a(sizeof(x), ntoh(x.size_content), h%UInt32)) + h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last(mmhash128_a(sizeof(x), ntoh(x.size_content), h%UInt32)) + h | |
last(mmhash128_a(sizeof(x), bswap(x.size_content), h%UInt32)) + h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, what's the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ntoh is short for network to host, so if host (the computer executing it) is little endian or big little endian it does it differently right? So bswap
will always swap them? I am bit confused. here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bswap
will always swap.
nthoh
will not swap if julia is running on a big endian machine.
Only if it is on a little endian machine.
Julia only runs on little endian machines however.
Main advantage is that bswap
says what it does without remembering network to host.
Do you have benchmarks? |
11a26a5
to
9f43169
Compare
I'm not sure that I trust |
9f43169
to
6059c89
Compare
Nice, it is constant-folding. |
Base.hash(x::ShortString, h::UInt) = hash(String(x), h) | ||
function Base.hash(x::ShortString, h::UInt) | ||
h += Base.memhash_seed | ||
last(mmhash128_a(sizeof(x), bswap(x.size_content), h%UInt32)) + h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should be:
last(mmhash128_a(sizeof(x), bswap(x.size_content), h%UInt32)) + h | |
last(mmhash128_a(ncodeunits(x), bswap(x.size_content), h%UInt32)) + h |
to be more semantic
|
||
# Warning: if a SubString is at the very end of a string, which is at the end of allocated | ||
# memory, this can cause an access violation, by trying to access past the end | ||
# (for example, reading a 1 byte substring at the end of a length 119 string, could go past | ||
# the end) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need a proper fix for this # #
No description provided.