Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A better hashtable implementation #10897

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Akaricchi
Copy link
Contributor

@Akaricchi Akaricchi commented Sep 18, 2024

The original SDL hashtable implementation is pretty naive. Every key-value pair is stored in a separate allocation, and collisions are resolved via a linked list. While this scheme is very simple to implement, it's bad for cache locality (and therefore performance).

This new implementation uses an open-addressing scheme, which places every pair in one contiguous array instead of separate buckets. It automatically grows as the load factor increases, which was a TODO in the old code. Linear probing is used to resolve collisions, which keeps colliding items close in memory. Robin hood hashing is used to greatly reduce variance in the probe sequence length across all items. The backward shifting optimization for deletions (described at the previous link) is also implemented, but "smart search" is not.

This is a very versatile hashtable optimized for lookup performance. I originally wrote this for Taisei Project, where it served us well in hot paths for years. The motivation for porting it was to speed up some hash lookups in the Vulkan GPU backend.

It's definitely not the most sophisticated or optimal algorithm, but I think it's a good balance between performance and implementation simplicity. The main thing holding it back, though, is that SDL's hashtables are not templated, so we're stuck with pointer-sized keys and values. It's often just barely not enough, requiring us to malloc the key or do other silly things. This throws a wrench into all the cache-locality goodness, though doing less pointer chasing is still beneficial in the end.

@slouken
Copy link
Collaborator

slouken commented Sep 18, 2024

Thanks for the contribution! Do you have any performance numbers to compare before and after this change?

@Akaricchi
Copy link
Contributor Author

I only had some perf traces from an experimental vulkan GPU branch. I wanted to write an actual test and a benchmark for this, but since it's an internal API, I'm not sure where to put that code. I had to resort to some embarrassing hacks to get some temporary test code to run as I was working on this. Can you help me out here?

I might just copy-paste both implementations into an existing benchmark harness to get you some raw numbers tomorrow.

@icculus
Copy link
Collaborator

icculus commented Sep 18, 2024

I'm totally fine with changing out the hashtable implementation...the code was just meant to be Good Enough but definitely has room for improvement.

I haven't looked at the patch yet but the arguments in the PR description are valid.

This implementation is coming from an MIT-licensed program; do you have permission to relicense this piece under the zlib license?

@Akaricchi
Copy link
Contributor Author

This implementation is coming from an MIT-licensed program; do you have permission to relicense this piece under the zlib license?

I wrote the original code and I'm the de-facto lead developer of that project, so I'm pretty sure I do :)

@slouken
Copy link
Collaborator

slouken commented Sep 19, 2024

I might just copy-paste both implementations into an existing benchmark harness to get you some raw numbers tomorrow.

It's probably okay to add testhashtable, or something to one of the automated tests and #include "../src/SDL_hashtable.c". It would definitely be worthwhile to add something to the testautomation suite that added random values to the hashtable to verify correctness.

This is mostly ported from Taisei Project
@Akaricchi
Copy link
Contributor Author

Didn't have time for a test or dedicated benchmark yet, but I've been comparing Vulkan performance against the default hashtable as @thatcosmonaut had been working on #10910. An experimental optimization that didn't make it into that patchset was using a hashtable to track bound resources.

Under a "normal" load (running a Taisei replay without the framerate limit) the performance initially seemed worse than a linear search through an array. With the new hashtable, it was on par for this workload. Under a "torture" load that's very heavy on resource tracking (another Taisei replay, but with sprite batching disabled), the hashtable had a significant performance boost. Swapping the original hashtable for the new one pushed it further ahead.

Here are some numbers from that test:

descriptor_pool_rewrite:            16505 frames in 61.03 sec ~= 270.43 FPS
bound_hashtable:                    16505 frames in 59.24 sec ~= 278.60 FPS
bound_hashtable + robinhood:        16505 frames in 54.10 sec ~= 305.07 FPS

Note that this is far from a pure hashtable benchmark, there is a ton of overhead here. Still, ~27 extra FPS just from swapping the hashtable implementation doesn't sound too bad.

We've decided to revisit that optimization once this PR is merged.

@slouken slouken added this to the 3.2.0 milestone Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants