HashMap starts the search even if the map is empty #38880

pmarcelll · 2017-01-06T17:45:45Z

It was mentioned in the latest Ferris Makes Emulators episode that contains showed up in the profiling results, although a lot of the times the map itself was empty.
I think the extra checking is worth it before starting the search and that way the hashing is also skipped.

The text was updated successfully, but these errors were encountered:

Manishearth · 2017-01-06T18:09:02Z

contains() calls contains_key() which calls search() and finally search_hashed()

That function checks for zero capacity the first thing it does. The only work that is done between the contains() call and search_hashed() is computing the hash.

Now, if the map was empty (but had nonzero capacity) we'd have a problem. search_hashed can't bail early in that case because it still needs to produce the entry, but contains doesn't care about that.

In Ferris' example, were hashmaps with nonzero capacity but zero length used? You can make that happen with HashMap::with_capacity(foo), but by default hashmaps are zero capacity (no allocation till you insert things into them). If not, this probably isn't the problem.

Bailing out on size == 0 in contains is an interesting idea. Small negative impact on users of regular hashmaps, large positive impact on users of preallocated hashmaps.

leonardo-m · 2017-01-06T18:26:20Z

Small negative impact on users of regular hashmaps, large positive impact on users of preallocated hashmaps.

How much "small" is the negative impact in the regular case needs to be benchmarked.

Manishearth · 2017-01-06T18:26:57Z

Yes. It's probably negligible.

Stebalien · 2017-01-06T20:17:16Z

@Manishearth That really depends on the use case. If you have a large vector of mostly empty HashMap/HashSets, repeatedly re-hashing the key could take a while. However, this probably isn't a common use-case so the programmer could just check the length in cases like this.

Anyways, wasn't there some talk about falling back on linear search/storage for small maps/sets? That would probably be more useful in practice.

arthurprs · 2017-01-19T21:38:47Z

I wanted to move the capacity check from search_hashed up in the call stack, notably to the get* and contain methods (and there it can become a len = 0 check, solving this), as entry and insert reserve at least 1 space upfront anyway (so it's never really capacity 0). This saves 1 branch offsetting the extra ones required by #38368 . Probably not worth the trouble though :)

technicalguy · 2018-01-30T17:50:41Z

So I've had a look into this issue.

The only work that is done between the contains() call and search_hashed() is computing the hash.

Computing the hash can be quite costly, even for integer keys on an empty HashMap it seems contains() is an order of magnitude faster with an empty check before hash computation than with one after.

Bailing out on size == 0 in contains is an interesting idea. Small negative impact on users of regular hashmaps, large positive impact on users of preallocated hashmaps.

I think it should be possible to make an early size == 0 check without requiring a second check afterwards (i.e. no negative impact on regular HashMap users with non-empty maps). Some kind of search_hashed_no_empty_check method could be called by a search_no_empty_check method, given that contains_key has already checked that it is empty.

Anyways, wasn't there some talk about falling back on linear search/storage for small maps/sets? That would probably be more useful in practice.

This works well for small collections (e.g. integer keys up to about 16 items), however the problem is that different types require different amounts of time to compute the hash, and and also different amounts of time to compute equality. This is not necessarily consistent between values (e.g. strings or lists).

I think I can implement this for types with simple equality checks, but it cannot be used in general. (The empty check can be used in general, so I propose we implement both.)

This addresses issue rust-lang#38880

…ap-38880, r=arthurprs Early exit for empty HashMap (issue rust-lang#38880) Addresses issue rust-lang#38880 by checking if the HashMap is empty before computing the value of the hash. Before (integer keys) ``` running 4 tests test empty_once ... bench: 13 ns/iter (+/- 0) test empty_100 ... bench: 1,367 ns/iter (+/- 35) test exist_once ... bench: 14 ns/iter (+/- 0) test exist_100 ... bench: 1,518 ns/iter (+/- 40) ``` After ``` running 4 tests test empty_once ... bench: 2 ns/iter (+/- 0) test empty_100 ... bench: 221 ns/iter (+/- 0) test exist_once ... bench: 15 ns/iter (+/- 0) test exist_100 ... bench: 1,515 ns/iter (+/- 92) ``` When the HashMap is not empty, the performance remains the same, and when it is empty the performance is significantly improved.

lucab · 2019-01-29T16:37:08Z

Given that #48035 landed, is there anything else pending from this ticket?

technicalguy · 2019-01-29T18:00:46Z

For this issue

HashMap starts the search even if the map is empty

there is nothing left to do.

For the linear search

Anyways, wasn't there some talk about falling back on linear search/storage for small maps/sets? That would probably be more useful in practice.

I don't know if it is possible: The time to compute equality and the time to compute a hash varies with different types (these two time determine the threshold at which it becomes faster to hash than to linear search). Not only that, but moreover, it could vary with different architectures. This means that determining the threshold is probably unfeasible in any kind of generality.

(You might be able to say that for a particular processor architecture, when the entire map is in the L1 cache, and your keys are 32-bit integers, then it is faster to do a linear search if the size of your map is up to 27 entries (for example), and after that, it is faster to hash the integer key and do a regular lookup. However, if any of these assumptions change then it might not be possible to guarantee a faster lookup for a map with 27 entries. The one feature that you really want from your HashMap is that the lookup time is bounded (with a certain probability) and it's very hard to do a linear search and guarantee that you will be at least as fast as a hash-lookup.)

The only thing that could certainly be implemented is if the hashmap contains only one element, then just do a single comparison rather than a hash lookup and then the comparison. i.e. hashmaps with size 1 don't need to compute the hash.

pmarcelll · 2019-01-29T18:20:55Z

I didn't realize this issue wasn't closed by #48035, and I also think this should be closed since it would be more appropriate to open a new issue for future optimizations.

technicalguy · 2019-01-29T19:28:56Z

Yes, I agree. It wasn't closed because I wrote "Addresses issue #38880" in the pull request not "Fixes #38880" which would have caused GitHub to auto close it. (At the time I thought linear search was more feasible and might be implemented as an extension to the empty-hashmap case, and so it seemed like it could make sense for the issue to remain open a little while longer. Clearly that was not to be.)

apasel422 added the A-libs label Jan 6, 2017

steveklabnik added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed A-libs labels Mar 24, 2017

Mark-Simulacrum added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Jun 19, 2017

Mark-Simulacrum added C-enhancement Category: An issue proposing an enhancement or a PR with one. and removed C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Jul 26, 2017

technicalguy mentioned this issue Feb 6, 2018

Early exit for empty HashMap (issue #38880) #48035

Merged

technicalguy added a commit to technicalguy/rust that referenced this issue Feb 6, 2018

38880 don't compute hash when searching an empty HashMap

9bc5986

This addresses issue rust-lang#38880

pmarcelll closed this as completed Jan 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HashMap starts the search even if the map is empty #38880

HashMap starts the search even if the map is empty #38880

pmarcelll commented Jan 6, 2017

Manishearth commented Jan 6, 2017 •

edited

Loading

leonardo-m commented Jan 6, 2017

Manishearth commented Jan 6, 2017

Stebalien commented Jan 6, 2017

arthurprs commented Jan 19, 2017 •

edited

Loading

technicalguy commented Jan 30, 2018

lucab commented Jan 29, 2019

technicalguy commented Jan 29, 2019

pmarcelll commented Jan 29, 2019

technicalguy commented Jan 29, 2019

HashMap starts the search even if the map is empty #38880

HashMap starts the search even if the map is empty #38880

Comments

pmarcelll commented Jan 6, 2017

Manishearth commented Jan 6, 2017 • edited Loading

leonardo-m commented Jan 6, 2017

Manishearth commented Jan 6, 2017

Stebalien commented Jan 6, 2017

arthurprs commented Jan 19, 2017 • edited Loading

technicalguy commented Jan 30, 2018

lucab commented Jan 29, 2019

technicalguy commented Jan 29, 2019

pmarcelll commented Jan 29, 2019

technicalguy commented Jan 29, 2019

Manishearth commented Jan 6, 2017 •

edited

Loading

arthurprs commented Jan 19, 2017 •

edited

Loading