Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check hashes first during probing the aggr hash table #11718

Merged
merged 2 commits into from
Jul 31, 2024

Conversation

Rachelint
Copy link
Contributor

Which issue does this PR close?

Closes #11717

Rationale for this change

See #11717

What changes are included in this PR?

See title.

Are these changes tested?

By exist tests.

Are there any user-facing changes?

No.

@Rachelint
Copy link
Contributor Author

The clickbench result

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       main ┃ check-hash-first ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.81ms │           0.77ms │ +1.06x faster │
│ QQuery 1     │    69.11ms │          69.79ms │     no change │
│ QQuery 2     │   169.47ms │         160.89ms │ +1.05x faster │
│ QQuery 3     │   182.60ms │         180.05ms │     no change │
│ QQuery 4     │  1589.41ms │        1595.87ms │     no change │
│ QQuery 5     │  1597.02ms │        1563.24ms │     no change │
│ QQuery 6     │    57.92ms │          59.34ms │     no change │
│ QQuery 7     │    72.33ms │          71.02ms │     no change │
│ QQuery 8     │  2415.02ms │        2293.14ms │ +1.05x faster │
│ QQuery 9     │  1928.42ms │        1912.94ms │     no change │
│ QQuery 10    │   544.39ms │         539.45ms │     no change │
│ QQuery 11    │   605.79ms │         606.26ms │     no change │
│ QQuery 12    │  1767.57ms │        1748.26ms │     no change │
│ QQuery 13    │  4073.33ms │        3979.57ms │     no change │
│ QQuery 14    │  2583.14ms │        2518.41ms │     no change │
│ QQuery 15    │  1784.13ms │        1777.43ms │     no change │
│ QQuery 16    │  5028.55ms │        4898.04ms │     no change │
│ QQuery 17    │  4956.14ms │        4796.22ms │     no change │
│ QQuery 18    │ 10436.51ms │       10168.34ms │     no change │
│ QQuery 19    │   144.11ms │         147.18ms │     no change │
│ QQuery 20    │  3310.77ms │        3286.34ms │     no change │
│ QQuery 21    │  3887.09ms │        3867.43ms │     no change │
│ QQuery 22    │  9398.96ms │        9008.04ms │     no change │
│ QQuery 23    │ 23087.26ms │       22804.51ms │     no change │
│ QQuery 24    │  1168.15ms │        1139.59ms │     no change │
│ QQuery 25    │  1046.92ms │        1010.22ms │     no change │
│ QQuery 26    │  1352.80ms │        1317.86ms │     no change │
│ QQuery 27    │  4711.92ms │        4698.67ms │     no change │
│ QQuery 28    │ 21891.92ms │       22870.99ms │     no change │
│ QQuery 29    │   920.19ms │         901.89ms │     no change │
│ QQuery 30    │  2075.81ms │        2036.71ms │     no change │
│ QQuery 31    │  2961.03ms │        2844.67ms │     no change │
│ QQuery 32    │ 16167.05ms │       15106.28ms │ +1.07x faster │
│ QQuery 33    │  9418.20ms │        9429.24ms │     no change │
│ QQuery 34    │  9388.74ms │        9431.36ms │     no change │
│ QQuery 35    │  3108.34ms │        3021.89ms │     no change │
│ QQuery 36    │   270.02ms │         269.25ms │     no change │
│ QQuery 37    │   166.63ms │         156.78ms │ +1.06x faster │
│ QQuery 38    │   158.33ms │         157.94ms │     no change │
│ QQuery 39    │   834.47ms │         844.51ms │     no change │
│ QQuery 40    │    63.22ms │          62.05ms │     no change │
│ QQuery 41    │    59.97ms │          58.34ms │     no change │
│ QQuery 42    │    70.34ms │          72.41ms │     no change │
└──────────────┴────────────┴──────────────────┴───────────────┘

@Rachelint Rachelint marked this pull request as ready for review July 30, 2024 08:06
Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Rachelint

// verify that a group that we are inserting with hash is
// actually the same key value as the group in
// existing_idx (aka group_values @ row)
group_rows.row(row) == group_values.row(*group_idx)
target_hash == *exist_hash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am somewhat confused about how this makes things faster -- I thought that the check for equal hash was done as part of self.map.get_mut (aka the closure is only called when the hashes are equal, so I would expct this comparison to always be true)

However, if the benchmark results show it is an improvement wonderful.

I'll run the numbers as well to confirm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it either.
I thought the closure (row values equal check) is triggered only if the hash is matched 😕

Copy link
Contributor Author

@Rachelint Rachelint Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am somewhat confused about how this makes things faster -- I thought that the check for equal hash was done as part of self.map.get_mut (aka the closure is only called when the hashes are equal, so I would expct this comparison to always be true)

However, if the benchmark results show it is an improvement wonderful.

I'll run the numbers as well to confirm.

This is my thought about why faster.

I read the source code in hashbrown, and the get_mut procedure is like this:

  • Use the hash value to find the first bucket.
  • If the bucket is filled, it will use the eq function passed by us to check if it is the target, for example, the prev eq function.
|(exist_hash, group_idx)| {
                // verify that a group that we are inserting with hash is
                // actually the same key value as the group in
                // existing_idx  (aka group_values @ row)
                group_rows.row(row) == group_values.row(*group_idx)
            })
  • If eq return true, it is the target, otherwise we need to prob next and check.

In the high cardinality aggr scenario, the entry often actually not exist in the hash table.

And after the hash table grow too large(many buckets are filled), the prob will perform many times and finally find nothing...

In this sitution, check the hash first can reduce the random memory accesses compared to directy check the group value through group index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @jayzhan211 , can see the guess above.

Copy link
Contributor

@Dandandan Dandandan Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rachelint
My understanding is the similar, collision will happen quite often in get_mut, equality of u64 hashes will be faster than retrieving / comparing rows.
Hash collisions are usually very low, even for high cardinality, but RawTable::get_mut doesn't check for equality itself, just finds a first match without guaranteeing hash values are the same (equality check should be in the provided equality function). In other implementations we also check for hash values to be equal first.

Copy link
Contributor Author

@Rachelint Rachelint Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious -- I think the code @Rachelint is referring to in hashbrown is here

https://github.com/rust-lang/hashbrown/blob/ac00a0bbef46f02f555e235f57ce263aefa361e0/src/raw/mod.rs#L2183-L2199

It does appear that collisions could happen (it is doing some sort of abbreviated check by condensing the actual hash value to a byte or something), though I don't fully understand how it works.

I wonder if there are other places we could use this observation 🤔

Good idea! I am trying to find it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea to check hash before expensive arrow row comparison makes sense to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious -- I think the code @Rachelint is referring to in hashbrown is here

https://github.com/rust-lang/hashbrown/blob/ac00a0bbef46f02f555e235f57ce263aefa361e0/src/raw/mod.rs#L2183-L2199

It does appear that collisions could happen (it is doing some sort of abbreviated check by condensing the actual hash value to a byte or something), though I don't fully understand how it works.

I wonder if there are other places we could use this observation 🤔

This talk explains the details of this one-byte abbreviation trick https://www.youtube.com/watch?v=ncHmEUmJZf4 , I vaguely remember they said when this 1 byte check is done, it's very likely to find the correct slot.
Looks like it's not working well when the hash table size grows over some threshold?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious -- I think the code @Rachelint is referring to in hashbrown is here
https://github.com/rust-lang/hashbrown/blob/ac00a0bbef46f02f555e235f57ce263aefa361e0/src/raw/mod.rs#L2183-L2199
It does appear that collisions could happen (it is doing some sort of abbreviated check by condensing the actual hash value to a byte or something), though I don't fully understand how it works.
I wonder if there are other places we could use this observation 🤔

This talk explains the details of this one-byte abbreviation trick https://www.youtube.com/watch?v=ncHmEUmJZf4 , I vaguely remember they said when this 1 byte check is done, it's very likely to find the correct slot. Looks like it's not working well when the hash table size grows over some threshold?

Seems make sense, maybe we should only add this check when found the hash table size is larger than a threshold 🤔 .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking this rationale is very non obvious, so I proposed adding some comments on th rationale here #11750

@alamb
Copy link
Contributor

alamb commented Jul 30, 2024

I am running benchmarks against this PR now

@alamb
Copy link
Contributor

alamb commented Jul 30, 2024

Interestingly my benchmarks against this PR showed more mixed results:

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ check-hash-first ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.80ms │           0.81ms │     no change │
│ QQuery 1     │   102.55ms │         100.88ms │     no change │
│ QQuery 2     │   207.48ms │         213.19ms │     no change │
│ QQuery 3     │   207.76ms │         209.69ms │     no change │
│ QQuery 4     │  2222.82ms │        2289.69ms │     no change │
│ QQuery 5     │  2009.01ms │        2087.26ms │     no change │
│ QQuery 6     │    89.41ms │          87.13ms │     no change │
│ QQuery 7     │   103.94ms │         105.40ms │     no change │
│ QQuery 8     │  3246.71ms │        3017.22ms │ +1.08x faster │
│ QQuery 9     │  2435.85ms │        2449.21ms │     no change │
│ QQuery 10    │   854.22ms │         856.77ms │     no change │
│ QQuery 11    │   924.61ms │         936.51ms │     no change │
│ QQuery 12    │  2130.15ms │        2198.85ms │     no change │
│ QQuery 13    │  4641.38ms │        4552.21ms │     no change │
│ QQuery 14    │  2883.17ms │        2883.05ms │     no change │
│ QQuery 15    │  2488.04ms │        2461.62ms │     no change │
│ QQuery 16    │  5893.04ms │        5836.59ms │     no change │
│ QQuery 17    │  5863.02ms │        5676.28ms │     no change │
│ QQuery 18    │ 11877.60ms │       11578.52ms │     no change │
│ QQuery 19    │   168.32ms │         181.08ms │  1.08x slower │
│ QQuery 20    │  2734.34ms │        2747.95ms │     no change │
│ QQuery 21    │  3457.31ms │        3545.68ms │     no change │
│ QQuery 22    │  8679.75ms │        9621.59ms │  1.11x slower │
│ QQuery 23    │ 21872.12ms │       22592.60ms │     no change │
│ QQuery 24    │  1351.77ms │        1418.37ms │     no change │
│ QQuery 25    │  1119.32ms │        1201.18ms │  1.07x slower │
│ QQuery 26    │  1483.02ms │        1541.26ms │     no change │
│ QQuery 27    │  4027.53ms │        4104.40ms │     no change │
│ QQuery 28    │ 28326.59ms │       28429.20ms │     no change │
│ QQuery 29    │  1051.56ms │        1078.07ms │     no change │
│ QQuery 30    │  2557.22ms │        2551.44ms │     no change │
│ QQuery 31    │  3240.11ms │        3164.86ms │     no change │
│ QQuery 32    │ 17175.99ms │       16627.27ms │     no change │
│ QQuery 33    │  9793.02ms │        9651.29ms │     no change │
│ QQuery 34    │  9684.26ms │        9709.83ms │     no change │
│ QQuery 35    │  3850.71ms │        3729.01ms │     no change │
│ QQuery 36    │   353.62ms │         352.64ms │     no change │
│ QQuery 37    │   223.88ms │         235.19ms │  1.05x slower │
│ QQuery 38    │   202.00ms │         208.92ms │     no change │
│ QQuery 39    │  1167.41ms │        1173.25ms │     no change │
│ QQuery 40    │    94.37ms │          97.81ms │     no change │
│ QQuery 41    │    85.27ms │          86.30ms │     no change │
│ QQuery 42    │   101.90ms │         102.17ms │     no change │
└──────────────┴────────────┴──────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary               ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (main_base)          │ 170982.96ms │
│ Total Time (check-hash-first)   │ 171692.24ms │
│ Average Time (main_base)        │   3976.35ms │
│ Average Time (check-hash-first) │   3992.84ms │
│ Queries Faster                  │           1 │
│ Queries Slower                  │           4 │
│ Queries with No Change          │          38 │
└─────────────────────────────────┴─────────────┘

@Rachelint
Copy link
Contributor Author

Rachelint commented Jul 30, 2024

@alamb Possible due to my main branch is not the latest?
I am running it again in local after pulling.

@alamb
Copy link
Contributor

alamb commented Jul 30, 2024

@alamb Possible due to my main branch is not the latest? I am running it again in local after pulling.

Thanks @Rachelint -- I am also trying again to see if I did something wrong -- I also filed #11722 to see if that could be related 🤔

@Rachelint
Copy link
Contributor Author

Rachelint commented Jul 30, 2024

@alamb Possible due to my main branch is not the latest? I am running it again in local after pulling.

Thanks @Rachelint -- I am also trying again to see if I did something wrong -- I also filed #11722 to see if that could be related 🤔

Seems so strange, I run it after pulling and rebasing, it still seems faster in the expected case q32

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       main ┃ check-hash-first ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │ 16194.66ms │       15067.19ms │ +1.07x faster │
└──────────────┴────────────┴──────────────────┴───────────────┘

@Rachelint
Copy link
Contributor Author

Rachelint commented Jul 30, 2024

@alamb Possible due to my main branch is not the latest? I am running it again in local after pulling.

Thanks @Rachelint -- I am also trying again to see if I did something wrong -- I also filed #11722 to see if that could be related 🤔

Another possibility I could think is that maybe it is due to cache size? This the cpu info about my benchmark machine.

CPU MHz:             993.713
CPU max MHz:           4400.0000
CPU min MHz:           800.0000

L1d cache:          32K
L1i cache:          32K
L2 cache:           256K
L3 cache:           9216K

@jayzhan211
Copy link
Contributor

I can run the benchmark too

@jayzhan211
Copy link
Contributor

jayzhan211 commented Jul 30, 2024

release-nonlto mode

--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       main ┃ check-hash-first ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 00.48ms │           0.44ms │ +1.09x faster │
│ QQuery 142.21ms │          41.12ms │     no change │
│ QQuery 274.19ms │          81.59ms │  1.10x slower │
│ QQuery 362.76ms │          72.01ms │  1.15x slower │
│ QQuery 4421.24ms │         475.74ms │  1.13x slower │
│ QQuery 5702.16ms │         683.13ms │     no change │
│ QQuery 637.37ms │          40.15ms │  1.07x slower │
│ QQuery 740.82ms │          40.23ms │     no change │
│ QQuery 8765.87ms │         747.78ms │     no change │
│ QQuery 9663.02ms │         643.98ms │     no change │
│ QQuery 10197.35ms │         196.56ms │     no change │
│ QQuery 11220.60ms │         213.23ms │     no change │
│ QQuery 12726.93ms │         703.93ms │     no change │
│ QQuery 131419.25ms │        1464.85ms │     no change │
│ QQuery 14993.33ms │         993.17ms │     no change │
│ QQuery 15496.78ms │         483.43ms │     no change │
│ QQuery 162026.78ms │        1970.75ms │     no change │
│ QQuery 171843.15ms │        1886.30ms │     no change │
│ QQuery 184766.27ms │        4960.04ms │     no change │
│ QQuery 1958.23ms │          56.11ms │     no change │
│ QQuery 201518.53ms │        1480.85ms │     no change │
│ QQuery 211736.85ms │        1750.68ms │     no change │
│ QQuery 224067.41ms │        4129.24ms │     no change │
│ QQuery 238258.31ms │        8458.29ms │     no change │
│ QQuery 24483.89ms │         502.11ms │     no change │
│ QQuery 25492.77ms │         489.45ms │     no change │
│ QQuery 26549.74ms │         560.24ms │     no change │
│ QQuery 271317.31ms │        1382.42ms │     no change │
│ QQuery 2810165.79ms │       10165.27ms │     no change │
│ QQuery 29409.71ms │         399.79ms │     no change │
│ QQuery 30857.91ms │         847.23ms │     no change │
│ QQuery 31983.91ms │         947.09ms │     no change │
│ QQuery 329653.47ms │        9369.41ms │     no change │
│ QQuery 334134.36ms │        3266.48ms │ +1.27x faster │
│ QQuery 343872.37ms │        3991.76ms │     no change │
│ QQuery 351055.52ms │        1039.10ms │     no change │
│ QQuery 36144.10ms │         148.25ms │     no change │
│ QQuery 37100.81ms │         101.38ms │     no change │
│ QQuery 38105.72ms │         107.32ms │     no change │
│ QQuery 39387.52ms │         389.59ms │     no change │
│ QQuery 4034.77ms │          34.62ms │     no change │
│ QQuery 4132.97ms │          32.96ms │     no change │
│ QQuery 4243.08ms │          42.63ms │     no change │
└──────────────┴────────────┴──────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)65965.62ms │
│ Total Time (check-hash-first)65390.71ms │
│ Average Time (main)1534.08ms │
│ Average Time (check-hash-first)1520.71ms │
│ Queries Faster2 │
│ Queries Slower4 │
│ Queries with No Change37 │
└─────────────────────────────────┴────────────┘

hmm...

@Rachelint
Copy link
Contributor Author

Rachelint commented Jul 30, 2024

@jayzhan211 interesting... q33 1.27x faster...
Maybe it is really related to hardware...

I run q32 in release mode and the result still same as release-nonlto mode...

I am running the whole queries now.

@jayzhan211
Copy link
Contributor

jayzhan211 commented Jul 30, 2024

release mode

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       main ┃ check-hash-first ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 00.40ms │           0.41ms │     no change │
│ QQuery 138.61ms │          37.93ms │     no change │
│ QQuery 273.86ms │          73.49ms │     no change │
│ QQuery 363.24ms │          64.05ms │     no change │
│ QQuery 4425.21ms │         390.92ms │ +1.09x faster │
│ QQuery 5694.98ms │         650.10ms │ +1.07x faster │
│ QQuery 637.20ms │          38.77ms │     no change │
│ QQuery 738.30ms │          37.54ms │     no change │
│ QQuery 8757.20ms │         663.71ms │ +1.14x faster │
│ QQuery 9654.19ms │         628.15ms │     no change │
│ QQuery 10193.07ms │         184.71ms │     no change │
│ QQuery 11215.94ms │         209.10ms │     no change │
│ QQuery 12728.18ms │         689.16ms │ +1.06x faster │
│ QQuery 131386.26ms │        1318.11ms │     no change │
│ QQuery 14975.30ms │         955.42ms │     no change │
│ QQuery 15487.11ms │         472.50ms │     no change │
│ QQuery 161799.89ms │        1582.73ms │ +1.14x faster │
│ QQuery 171673.83ms │        1550.87ms │ +1.08x faster │
│ QQuery 184557.05ms │        3911.50ms │ +1.17x faster │
│ QQuery 1957.09ms │          57.37ms │     no change │
│ QQuery 201540.04ms │        1538.60ms │     no change │
│ QQuery 211764.33ms │        1767.64ms │     no change │
│ QQuery 224006.80ms │        4007.95ms │     no change │
│ QQuery 238261.65ms │        7982.00ms │     no change │
│ QQuery 24492.90ms │         474.27ms │     no change │
│ QQuery 25472.54ms │         480.76ms │     no change │
│ QQuery 26539.63ms │         537.50ms │     no change │
│ QQuery 271291.92ms │        1291.64ms │     no change │
│ QQuery 2810082.99ms │        9922.57ms │     no change │
│ QQuery 29403.20ms │         403.61ms │     no change │
│ QQuery 30833.31ms │         816.01ms │     no change │
│ QQuery 31966.80ms │         905.63ms │ +1.07x faster │
│ QQuery 329874.88ms │        9189.33ms │ +1.07x faster │
│ QQuery 333504.33ms │        3956.20ms │  1.13x slower │
│ QQuery 343501.75ms │        3635.95ms │     no change │
│ QQuery 351069.16ms │         957.72ms │ +1.12x faster │
│ QQuery 36148.00ms │         140.89ms │     no change │
│ QQuery 37100.97ms │          99.02ms │     no change │
│ QQuery 38103.56ms │         103.36ms │     no change │
│ QQuery 39378.87ms │         373.10ms │     no change │
│ QQuery 4033.78ms │          33.72ms │     no change │
│ QQuery 4132.01ms │          32.04ms │     no change │
│ QQuery 4240.22ms │          40.03ms │     no change │
└──────────────┴────────────┴──────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)64300.55ms │
│ Total Time (check-hash-first)62206.11ms │
│ Average Time (main)1495.36ms │
│ Average Time (check-hash-first)1446.65ms │
│ Queries Faster10 │
│ Queries Slower1 │
│ Queries with No Change32 │
└─────────────────────────────────┴────────────┘

I guess the change is not really related to those queries, we might need specialized query that could really benefit from it.
I think the query that has high cost on arrow::Row comparison could shows the difference.

@Rachelint
Copy link
Contributor Author

Rachelint commented Jul 30, 2024

@jayzhan211 Maybe hash table size should be considered, too.
If few collision exist, this pr added an extra u64 comparison actually 🤔 .

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran again with release mode (#11722) and it seems to be showing an improvement in Q32 as well and several of the other

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ check-hash-first ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.69ms │           0.71ms │     no change │
│ QQuery 1     │    91.81ms │          89.52ms │     no change │
│ QQuery 2     │   195.10ms │         190.79ms │     no change │
│ QQuery 3     │   203.15ms │         198.03ms │     no change │
│ QQuery 4     │  2230.05ms │        2237.94ms │     no change │
│ QQuery 5     │  1984.18ms │        2032.57ms │     no change │
│ QQuery 6     │    78.88ms │          79.93ms │     no change │
│ QQuery 7     │    94.95ms │          91.81ms │     no change │
│ QQuery 8     │  3227.81ms │        3011.43ms │ +1.07x faster │
│ QQuery 9     │  2384.85ms │        2360.24ms │     no change │
│ QQuery 10    │   847.94ms │         847.07ms │     no change │
│ QQuery 11    │   910.10ms │         915.36ms │     no change │
│ QQuery 12    │  2139.05ms │        2156.18ms │     no change │
│ QQuery 13    │  4697.20ms │        4549.50ms │     no change │
│ QQuery 14    │  2952.47ms │        2902.65ms │     no change │
│ QQuery 15    │  2450.43ms │        2445.44ms │     no change │
│ QQuery 16    │  5974.80ms │        5777.67ms │     no change │
│ QQuery 17    │  5895.89ms │        5650.60ms │     no change │
│ QQuery 18    │ 12165.50ms │       11615.76ms │     no change │
│ QQuery 19    │   168.33ms │         169.17ms │     no change │
│ QQuery 20    │  2749.31ms │        2709.84ms │     no change │
│ QQuery 21    │  3535.36ms │        3495.37ms │     no change │
│ QQuery 22    │  9567.54ms │        9511.10ms │     no change │
│ QQuery 23    │ 22300.42ms │       22589.13ms │     no change │
│ QQuery 24    │  1365.25ms │        1376.12ms │     no change │
│ QQuery 25    │  1176.10ms │        1176.65ms │     no change │
│ QQuery 26    │  1487.44ms │        1506.23ms │     no change │
│ QQuery 27    │  4022.80ms │        3984.70ms │     no change │
│ QQuery 28    │ 30287.63ms │       29893.82ms │     no change │
│ QQuery 29    │  1059.72ms │        1045.76ms │     no change │
│ QQuery 30    │  2503.34ms │        2480.10ms │     no change │
│ QQuery 31    │  3208.06ms │        3070.93ms │     no change │
│ QQuery 32    │ 17948.66ms │       16576.11ms │ +1.08x faster │
│ QQuery 33    │  9600.79ms │        9552.98ms │     no change │
│ QQuery 34    │  9703.76ms │        9574.08ms │     no change │
│ QQuery 35    │  3786.65ms │        3743.23ms │     no change │
│ QQuery 36    │   345.94ms │         344.00ms │     no change │
│ QQuery 37    │   232.55ms │         229.18ms │     no change │
│ QQuery 38    │   193.74ms │         199.95ms │     no change │
│ QQuery 39    │  1149.55ms │        1174.42ms │     no change │
│ QQuery 40    │    87.70ms │          92.48ms │  1.05x slower │
│ QQuery 41    │    79.73ms │          81.89ms │     no change │
│ QQuery 42    │    95.45ms │          98.37ms │     no change │
└──────────────┴────────────┴──────────────────┴───────────────┘

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary               ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (main_base)          │ 175180.68ms │
│ Total Time (check-hash-first)   │ 171828.82ms │
│ Average Time (main_base)        │   4073.97ms │
│ Average Time (check-hash-first) │   3996.02ms │
│ Queries Faster                  │           2 │
│ Queries Slower                  │           1 │
│ Queries with No Change          │          40 │
└─────────────────────────────────┴─────────────┘

So I think this looks good to me 👍

Thanks a lot @Rachelint and everyone

@alamb
Copy link
Contributor

alamb commented Jul 30, 2024

Here is another run showing improvement:

Details

Comparing main_base and check-hash-first
--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ check-hash-first ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.70ms │           0.70ms │     no change │
│ QQuery 1     │    91.56ms │          90.72ms │     no change │
│ QQuery 2     │   191.85ms │         186.74ms │     no change │
│ QQuery 3     │   200.63ms │         198.96ms │     no change │
│ QQuery 4     │  2211.34ms │        2238.22ms │     no change │
│ QQuery 5     │  2055.51ms │        2071.21ms │     no change │
│ QQuery 6     │    85.87ms │          85.66ms │     no change │
│ QQuery 7     │    93.26ms │          92.07ms │     no change │
│ QQuery 8     │  3213.88ms │        2974.12ms │ +1.08x faster │
│ QQuery 9     │  2332.27ms │        2351.33ms │     no change │
│ QQuery 10    │   860.69ms │         845.38ms │     no change │
│ QQuery 11    │   936.60ms │         909.28ms │     no change │
│ QQuery 12    │  2126.25ms │        2133.29ms │     no change │
│ QQuery 13    │  4685.59ms │        4528.49ms │     no change │
│ QQuery 14    │  2958.13ms │        2864.44ms │     no change │
│ QQuery 15    │  2508.42ms │        2440.92ms │     no change │
│ QQuery 16    │  6054.02ms │        5737.46ms │ +1.06x faster │
│ QQuery 17    │  5929.43ms │        5760.15ms │     no change │
│ QQuery 18    │ 12280.25ms │       11745.12ms │     no change │
│ QQuery 19    │   170.05ms │         165.38ms │     no change │
│ QQuery 20    │  2726.60ms │        2716.05ms │     no change │
│ QQuery 21    │  3501.18ms │        3519.52ms │     no change │
│ QQuery 22    │  9514.93ms │        9516.19ms │     no change │
│ QQuery 23    │ 22818.47ms │       22238.19ms │     no change │
│ QQuery 24    │  1380.11ms │        1375.95ms │     no change │
│ QQuery 25    │  1176.97ms │        1165.53ms │     no change │
│ QQuery 26    │  1494.63ms │        1493.36ms │     no change │
│ QQuery 27    │  4064.91ms │        4014.09ms │     no change │
│ QQuery 28    │ 31117.67ms │       30759.59ms │     no change │
│ QQuery 29    │  1045.69ms │        1047.90ms │     no change │
│ QQuery 30    │  2551.21ms │        2483.07ms │     no change │
│ QQuery 31    │  3252.34ms │        3092.13ms │     no change │
│ QQuery 32    │ 17868.98ms │       16585.30ms │ +1.08x faster │
│ QQuery 33    │  9423.88ms │        9830.09ms │     no change │
│ QQuery 34    │  9699.31ms │        9704.12ms │     no change │
│ QQuery 35    │  3835.63ms │        3776.00ms │     no change │
│ QQuery 36    │   350.32ms │         340.48ms │     no change │
│ QQuery 37    │   229.44ms │         230.08ms │     no change │
│ QQuery 38    │   198.22ms │         194.55ms │     no change │
│ QQuery 39    │  1190.97ms │        1166.81ms │     no change │
│ QQuery 40    │    94.33ms │          85.10ms │ +1.11x faster │
│ QQuery 41    │    83.11ms │          80.14ms │     no change │
│ QQuery 42    │    96.20ms │          98.17ms │     no change │
└──────────────┴────────────┴──────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary               ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (main_base)          │ 176701.41ms │
│ Total Time (check-hash-first)   │ 172932.04ms │
│ Average Time (main_base)        │   4109.34ms │
│ Average Time (check-hash-first) │   4021.68ms │
│ Queries Faster                  │           4 │
│ Queries Slower                  │           0 │
│ Queries with No Change          │          39 │
└─────────────────────────────────┴─────────────┘

@alamb alamb merged commit 89677ae into apache:main Jul 31, 2024
24 checks passed
@alamb
Copy link
Contributor

alamb commented Jul 31, 2024

Thanks again @Rachelint @Dandandan and @2010YOUY01

@Rachelint
Copy link
Contributor Author

Duckdb has the similar check(but just use u16 prefix of hash):
https://github.com/duckdb/duckdb/blob/f92559f42c118075baaa8daafc437954eb5c85ec/src/execution/aggregate_hashtable.cpp#L379-L386

And mentioned by @alamb , the related pr in duckdb is interesting to see
duckdb/duckdb#9575

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check saved hash first during probing bucket in aggr hash table
6 participants