Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check saved hash first during probing bucket in aggr hash table #11717

Closed
Rachelint opened this issue Jul 30, 2024 · 1 comment · Fixed by #11718
Closed

Check saved hash first during probing bucket in aggr hash table #11717

Rachelint opened this issue Jul 30, 2024 · 1 comment · Fixed by #11718
Assignees
Labels
enhancement New feature or request

Comments

@Rachelint
Copy link
Contributor

Is your feature request related to a problem or challenge?

Now two part aggregate hash table is used in datafusion, and we actually saved the hashes of groups in the hash table part.
But I found the saved hashes are not used during probing bucket, and we directly get group values and comapre instead, that will lead to many random memory accesses, and the compare operations are not cheap for some types.

Describe the solution you'd like

Maybe we should check the saved hashes first, and only check the group values when hashes are same for avoid collision.

Describe alternatives you've considered

No response

Additional context

I run the clickbench in local, it seems help to some cases.

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       main ┃ check-hash-first ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.81ms │           0.77ms │ +1.06x faster │
│ QQuery 1     │    69.11ms │          69.79ms │     no change │
│ QQuery 2     │   169.47ms │         160.89ms │ +1.05x faster │
│ QQuery 3     │   182.60ms │         180.05ms │     no change │
│ QQuery 4     │  1589.41ms │        1595.87ms │     no change │
│ QQuery 5     │  1597.02ms │        1563.24ms │     no change │
│ QQuery 6     │    57.92ms │          59.34ms │     no change │
│ QQuery 7     │    72.33ms │          71.02ms │     no change │
│ QQuery 8     │  2415.02ms │        2293.14ms │ +1.05x faster │
│ QQuery 9     │  1928.42ms │        1912.94ms │     no change │
│ QQuery 10    │   544.39ms │         539.45ms │     no change │
│ QQuery 11    │   605.79ms │         606.26ms │     no change │
│ QQuery 12    │  1767.57ms │        1748.26ms │     no change │
│ QQuery 13    │  4073.33ms │        3979.57ms │     no change │
│ QQuery 14    │  2583.14ms │        2518.41ms │     no change │
│ QQuery 15    │  1784.13ms │        1777.43ms │     no change │
│ QQuery 16    │  5028.55ms │        4898.04ms │     no change │
│ QQuery 17    │  4956.14ms │        4796.22ms │     no change │
│ QQuery 18    │ 10436.51ms │       10168.34ms │     no change │
│ QQuery 19    │   144.11ms │         147.18ms │     no change │
│ QQuery 20    │  3310.77ms │        3286.34ms │     no change │
│ QQuery 21    │  3887.09ms │        3867.43ms │     no change │
│ QQuery 22    │  9398.96ms │        9008.04ms │     no change │
│ QQuery 23    │ 23087.26ms │       22804.51ms │     no change │
│ QQuery 24    │  1168.15ms │        1139.59ms │     no change │
│ QQuery 25    │  1046.92ms │        1010.22ms │     no change │
│ QQuery 26    │  1352.80ms │        1317.86ms │     no change │
│ QQuery 27    │  4711.92ms │        4698.67ms │     no change │
│ QQuery 28    │ 21891.92ms │       22870.99ms │     no change │
│ QQuery 29    │   920.19ms │         901.89ms │     no change │
│ QQuery 30    │  2075.81ms │        2036.71ms │     no change │
│ QQuery 31    │  2961.03ms │        2844.67ms │     no change │
│ QQuery 32    │ 16167.05ms │       15106.28ms │ +1.07x faster │
│ QQuery 33    │  9418.20ms │        9429.24ms │     no change │
│ QQuery 34    │  9388.74ms │        9431.36ms │     no change │
│ QQuery 35    │  3108.34ms │        3021.89ms │     no change │
│ QQuery 36    │   270.02ms │         269.25ms │     no change │
│ QQuery 37    │   166.63ms │         156.78ms │ +1.06x faster │
│ QQuery 38    │   158.33ms │         157.94ms │     no change │
│ QQuery 39    │   834.47ms │         844.51ms │     no change │
│ QQuery 40    │    63.22ms │          62.05ms │     no change │
│ QQuery 41    │    59.97ms │          58.34ms │     no change │
│ QQuery 42    │    70.34ms │          72.41ms │     no change │
└──────────────┴────────────┴──────────────────┴───────────────┘
@Rachelint Rachelint added the enhancement New feature or request label Jul 30, 2024
@Rachelint
Copy link
Contributor Author

take

@Rachelint Rachelint changed the title Check saved hash first during probing bucket in hash map Check saved hash first during probing bucket in aggr hash table Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant