-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unordered-containers sometimes much, much slower than hashmap due to collisions. #121
Comments
I'm not sure if there's anything to be done. For example, you'd get this problem if you used a traditional, imperative hash table too. This is different from the other case we're discussing, where the default I'm in favor of closing this, unless we have an idea of how to improve the situation without hurting the more common case of having decent hash functions (too much). |
One question, if it's hard (expensive) to come up with a good hash function for lambdas, isn't it similarly hard (expensive) to come up with a good comparison function for use in the ordered map? |
Yes, comparing lambdas is quite expensive. Comparison keeps an alpha conversion map as it recurses over the two expressions. But for comparison the cost is (I think) more incremental as it happens when you compare lambdas (which might not happen), whereas hashing happens upfront. But I might experiment with making a good hash function. You can close this ticket if you want, but I think the docs should contain a warning (which all hash tables need) about excessive collisions. And I think it's nice that we have hashmap which has the behavior I need (which is a strange hybrid, I admit). |
What about using a sorted array? Insertions and deletions will still be bad, but lookups should be faster and size will stay the same. |
@augustss, I would think a more natural approach might be to convert the bound variables to De Bruijn indices and use the converted lambdas as keys, either for a |
I'm not a fan of de Bruijn indices. They have their own set of problems. |
If I understand it correctly, #217 attempts to fix the underlying issue here. There's a bit of info on the approach in the change to |
That was apparently a misunderstanding of mine. I quote from an email from @tibbe:
At this stage I'm not sure what could be done about this issue within Otherwise I'd tend to simply close this issue. |
I tried switching from hashmap to u-c for doing CSE in our compiler. I'm not sure how much slower it is, because I didn't have the patience to wait for the compiler to terminate with u-c. With hashmap the entire compilation takes 302s, with u-c I interrupted it after 1800s.
The problem is that it's difficult to produce a good hash value. The CSE basically comparing lambda expressions, but modulo alpha conversion. This means that you have to do something very expensive when finding bound variables, so instead I just give up at that point. This means that the hash values are poor and we get many, many collisions.
With hashmap it degrades gracefully because it uses Data.Map for collisions, whereas u-c uses linear search.
I'm not sure what could reasonably be done, perhaps use linear search up to some small number of collisions (~10), and then switch to an ordmap. Or perhaps just have a warning in the documentation.
The text was updated successfully, but these errors were encountered: