Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use faster hash function and track collisions? #370

Open
joshlf opened this issue Oct 27, 2022 · 2 comments
Open

Use faster hash function and track collisions? #370

joshlf opened this issue Oct 27, 2022 · 2 comments

Comments

@joshlf
Copy link

joshlf commented Oct 27, 2022

In this Reddit comment, it is suggested that C#'s hashmap implementation outperforms Rust's on a particular benchmark because C#'s uses a fast-but-dos-vulnerable hash function by default, tracks the number of collisions, and switches to a dos-resistant hash function above a certain threshold of collisions. I am taking the comment author's word for it that this is C#'s behavior (from a brief skim of the source code, it seems like maybe this is where it's implemented?). Regardless, it sounds like a reasonable idea:

  • Roughly speaking, a hash table is DoS resistant if an attacker is not able to significantly alter its performance characteristics (such as making lookup and insertion behave as linear time operations); if an attacker is only able to make hash table operations, say, a few times slower, this is probably fine for most applications - even those that are exposed to untrusted inputs
  • The overhead of tracking the number of collisions is likely small compared to the performance gains of a non-dos-resistant hash function
  • This is just speculation, but intuition tells me that it ought to be possible to choose a collision threshold that is both fairly low (and thus doesn't allow an attacker to pessimize hashmap performance that significantly) and is still higher than the number of collisions that will be encountered during most non-malicious workloads

Has an approach like this ever been considered for hashbrown? Given that both hashbrown and the standard library make no stability guarantees about their default hash function, this ought to be a backwards-compatible change. Finally, users who are especially concerned with DoS resistance could still pick a specific hash function in order to retain today's DoS resistance behavior.

@Amanieu
Copy link
Member

Amanieu commented Oct 28, 2022

I think a better solution would be to just improve the performance of SipHash. This was attempted in rust-lang/rust#69152 but didn't land due to the increased compilation times.

I'm concerned that adding more complexity to the hasher might be a net slowdown (it's definitely going to negatively affect compilation times). However feel free to give it a try if you think it can be an improvement.

@joshlf
Copy link
Author

joshlf commented Oct 30, 2022

Sounds good! Likely won't have time to take a stab at this any time soon - if anyone else reads this and is interested, don't hold back on my account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants