-
-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split test=Speed into SpeedBulk and SpeedSmall and report weighted average for Small key speed test #293
Conversation
8cc0101
to
4cd75f8
Compare
Similar story happened to
|
@rurban please, tell me, should I do anything to re-request review of this PR explicitly or are you just busy and I should just wait? |
I'm very busy right now, but I also don't like constant arrays in this
file. If so, in an extra file please.
Leonid Evdokimov ***@***.***> schrieb am Do., 5. Sept. 2024,
14:35:
… @rurban <https://github.com/rurban> please, tell me, should I do anything
to re-request review of this PR explicitly or are you just busy and I
should just wait?
—
Reply to this email directly, view it on GitHub
<#293 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAKGULGS4A66R7LBODCMB3ZVA3D3AVCNFSM6AAAAABNMGYB56VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZRGIZTMMZYG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
👌 Got you. Thanks for your reply. Constant arrays are currently gone, I've dropped them all as curating a "representative" dataset is a whole different story. I replaced those constant arrays with As far as I understand, you're okay with some hard-coded weights examples, so I'll add them as a separate file with references to data sources and will make one of them a default value for |
It reduces timer_mips() overhead by ≈385 ticks, from ≈613 to ≈228 as measured by "timer resolution" warning. CLOCK_MONOTONIC_COARSE works fine as timer_mips() needs precision of 1s and the lowest one for CLOCK_MONOTONIC_COARSE is 10ms in case of HZ=100.
Fixes rurban#284 by disabling PMP_Multilinear not only on aarch64, but also on arm64. Those are the same things under different labels.
…MAX} It adds SMHASHER_SMALLKEY_MAX environment variable to override default value of the longest "Small key" for hash and changes default value from 31 to 32 to make the Average a bit more fair to the hashes reading the memory word-by-word (dword, qword) and not byte-by-byte. SMHASHER_SMALLKEY_MIN is also added as a counterpart to benchmark hashes when the range of small key lengths is known.
Sounds better, thanks |
Weights coming from two datasets are hard-coded: DNS domain lengths and UMASH traces. Custom one might be passed via ENV{SMHASHER_SMALLKEY_WEIGHTS} It partly addresses the question at rurban#113 What is the "real" average cycles/hash value for a given hash function? We can't know, but we can estimate it better if we assume that the function timing does not depend on input (that's not true for hashes based on multiplication) and we know distribution of key length in advance (that might be somewhat known for certain classes of inputs, but the distribution varies across classes measurably).
I've found a "representative" data source for DNS domain lengths sample, so I've added it and UMASH distribution to I agree, that these data-points do not belong to |
Looks good now, I think. Will test it tomorrow. Esp the failing verifications look troublesome |
@rurban I've fixed all the issues GitHub CI highlighted in the following patch set: darkk/smhasher@master...macos Should I push those patches to this PR or should it be a separate one? |
Seperate please |
I hope, that also addresses the goal that @wangyi-fudan had in #113 while being way more "stable" in computational terms.
It gives results looking like that:
or
Possible improvements are: