-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++][Parquet] Some benchmarks report incorrect metrics #44081
Milestone
Comments
github-actions
bot
added
Component: Parquet
Component: C++
Component: Benchmarking
labels
Sep 12, 2024
pitrou
changed the title
[C++][Parquet] Some benchmarks report incorrect bytes/s. metric
[C++][Parquet] Some benchmarks report incorrect metrics
Sep 12, 2024
pitrou
added a commit
to pitrou/arrow
that referenced
this issue
Sep 12, 2024
…reader-writer-benchmark 1. items/sec and bytes/sec were set to the same value in some benchmarks 2. bytes/sec was incorrectly computed for boolean columns
pitrou
added a commit
to pitrou/arrow
that referenced
this issue
Sep 12, 2024
…reader-writer-benchmark 1. items/sec and bytes/sec were set to the same value in some benchmarks 2. bytes/sec was incorrectly computed for boolean columns
pitrou
added a commit
that referenced
this issue
Sep 12, 2024
…-writer-benchmark (#44082) ### Rationale for this change 1. items/sec and bytes/sec were set to the same value in some benchmarks 2. bytes/sec was incorrectly computed for boolean columns ### What changes are included in this PR? Fix parquet-arrow-reader-writer-benchmark to report correct metrics. #### Example (column writing) Before: ``` -------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------------------------- BM_WriteColumn<false,Int32Type> 43138428 ns 43118609 ns 15 bytes_per_second=927.674Mi/s items_per_second=972.736M/s BM_WriteColumn<true,Int32Type> 150528627 ns 150480597 ns 5 bytes_per_second=265.815Mi/s items_per_second=278.727M/s BM_WriteColumn<false,Int64Type> 49243514 ns 49214955 ns 14 bytes_per_second=1.58742Gi/s items_per_second=1.70448G/s BM_WriteColumn<true,Int64Type> 151526550 ns 151472832 ns 5 bytes_per_second=528.148Mi/s items_per_second=553.803M/s BM_WriteColumn<false,DoubleType> 59101372 ns 59068058 ns 12 bytes_per_second=1.32263Gi/s items_per_second=1.42016G/s BM_WriteColumn<true,DoubleType> 159944872 ns 159895095 ns 4 bytes_per_second=500.328Mi/s items_per_second=524.632M/s BM_WriteColumn<false,BooleanType> 32855604 ns 32845322 ns 21 bytes_per_second=304.457Mi/s items_per_second=319.247M/s BM_WriteColumn<true,BooleanType> 150566118 ns 150528329 ns 5 bytes_per_second=66.4327Mi/s items_per_second=69.6597M/s ``` After: ``` Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------------------------- BM_WriteColumn<false,Int32Type> 43919180 ns 43895926 ns 16 bytes_per_second=911.246Mi/s items_per_second=238.878M/s BM_WriteColumn<true,Int32Type> 153981290 ns 153929841 ns 5 bytes_per_second=259.859Mi/s items_per_second=68.1204M/s BM_WriteColumn<false,Int64Type> 49906105 ns 49860098 ns 14 bytes_per_second=1.56688Gi/s items_per_second=210.304M/s BM_WriteColumn<true,Int64Type> 154273499 ns 154202319 ns 5 bytes_per_second=518.799Mi/s items_per_second=68M/s BM_WriteColumn<false,DoubleType> 59789490 ns 59733498 ns 12 bytes_per_second=1.30789Gi/s items_per_second=175.542M/s BM_WriteColumn<true,DoubleType> 161235860 ns 161169670 ns 4 bytes_per_second=496.371Mi/s items_per_second=65.0604M/s BM_WriteColumn<false,BooleanType> 32962097 ns 32950864 ns 21 bytes_per_second=37.9353Mi/s items_per_second=318.224M/s BM_WriteColumn<true,BooleanType> 154103499 ns 154052873 ns 5 bytes_per_second=8.1141Mi/s items_per_second=68.066M/s ``` #### Example (column reading) Before: ``` --------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------------------------------- BM_ReadColumn<false,BooleanType>/-1/0 6456731 ns 6453510 ns 108 bytes_per_second=1.51323Gi/s items_per_second=1.62482G/s BM_ReadColumn<false,BooleanType>/1/20 19012505 ns 19006068 ns 36 bytes_per_second=526.148Mi/s items_per_second=551.706M/s BM_ReadColumn<true,BooleanType>/-1/1 58365426 ns 58251529 ns 12 bytes_per_second=171.669Mi/s items_per_second=180.008M/s BM_ReadColumn<true,BooleanType>/5/10 46498966 ns 46442191 ns 15 bytes_per_second=215.321Mi/s items_per_second=225.781M/s BM_ReadIndividualRowGroups 29617575 ns 29600557 ns 24 bytes_per_second=2.63931Gi/s items_per_second=2.83394G/s BM_ReadMultipleRowGroups 47416980 ns 47288951 ns 15 bytes_per_second=1.65208Gi/s items_per_second=1.7739G/s BM_ReadMultipleRowGroupsGenerator 29741012 ns 29722112 ns 24 bytes_per_second=2.62851Gi/s items_per_second=2.82235G/s ``` After: ``` --------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------------------------------- BM_ReadColumn<false,BooleanType>/-1/0 6438249 ns 6435159 ns 109 bytes_per_second=194.245Mi/s items_per_second=1.62945G/s BM_ReadColumn<false,BooleanType>/1/20 19427495 ns 19419378 ns 37 bytes_per_second=64.3687Mi/s items_per_second=539.964M/s BM_ReadColumn<true,BooleanType>/-1/1 58342877 ns 58298236 ns 12 bytes_per_second=21.4415Mi/s items_per_second=179.864M/s BM_ReadColumn<true,BooleanType>/5/10 46591584 ns 46532288 ns 15 bytes_per_second=26.8631Mi/s items_per_second=225.344M/s BM_ReadIndividualRowGroups 30039049 ns 30021676 ns 23 bytes_per_second=2.60229Gi/s items_per_second=349.273M/s BM_ReadMultipleRowGroups 47877663 ns 47650438 ns 15 bytes_per_second=1.63954Gi/s items_per_second=220.056M/s BM_ReadMultipleRowGroupsGenerator 30377987 ns 30360019 ns 23 bytes_per_second=2.57329Gi/s items_per_second=345.381M/s ``` ### Are these changes tested? Manually by running benchmarks. ### Are there any user-facing changes? No, but this breaks historical comparisons in continuous benchmarking. * GitHub Issue: #44081 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Issue resolved by pull request 44082 |
khwilson
pushed a commit
to khwilson/arrow
that referenced
this issue
Sep 14, 2024
…reader-writer-benchmark (apache#44082) ### Rationale for this change 1. items/sec and bytes/sec were set to the same value in some benchmarks 2. bytes/sec was incorrectly computed for boolean columns ### What changes are included in this PR? Fix parquet-arrow-reader-writer-benchmark to report correct metrics. #### Example (column writing) Before: ``` -------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------------------------- BM_WriteColumn<false,Int32Type> 43138428 ns 43118609 ns 15 bytes_per_second=927.674Mi/s items_per_second=972.736M/s BM_WriteColumn<true,Int32Type> 150528627 ns 150480597 ns 5 bytes_per_second=265.815Mi/s items_per_second=278.727M/s BM_WriteColumn<false,Int64Type> 49243514 ns 49214955 ns 14 bytes_per_second=1.58742Gi/s items_per_second=1.70448G/s BM_WriteColumn<true,Int64Type> 151526550 ns 151472832 ns 5 bytes_per_second=528.148Mi/s items_per_second=553.803M/s BM_WriteColumn<false,DoubleType> 59101372 ns 59068058 ns 12 bytes_per_second=1.32263Gi/s items_per_second=1.42016G/s BM_WriteColumn<true,DoubleType> 159944872 ns 159895095 ns 4 bytes_per_second=500.328Mi/s items_per_second=524.632M/s BM_WriteColumn<false,BooleanType> 32855604 ns 32845322 ns 21 bytes_per_second=304.457Mi/s items_per_second=319.247M/s BM_WriteColumn<true,BooleanType> 150566118 ns 150528329 ns 5 bytes_per_second=66.4327Mi/s items_per_second=69.6597M/s ``` After: ``` Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------------------------- BM_WriteColumn<false,Int32Type> 43919180 ns 43895926 ns 16 bytes_per_second=911.246Mi/s items_per_second=238.878M/s BM_WriteColumn<true,Int32Type> 153981290 ns 153929841 ns 5 bytes_per_second=259.859Mi/s items_per_second=68.1204M/s BM_WriteColumn<false,Int64Type> 49906105 ns 49860098 ns 14 bytes_per_second=1.56688Gi/s items_per_second=210.304M/s BM_WriteColumn<true,Int64Type> 154273499 ns 154202319 ns 5 bytes_per_second=518.799Mi/s items_per_second=68M/s BM_WriteColumn<false,DoubleType> 59789490 ns 59733498 ns 12 bytes_per_second=1.30789Gi/s items_per_second=175.542M/s BM_WriteColumn<true,DoubleType> 161235860 ns 161169670 ns 4 bytes_per_second=496.371Mi/s items_per_second=65.0604M/s BM_WriteColumn<false,BooleanType> 32962097 ns 32950864 ns 21 bytes_per_second=37.9353Mi/s items_per_second=318.224M/s BM_WriteColumn<true,BooleanType> 154103499 ns 154052873 ns 5 bytes_per_second=8.1141Mi/s items_per_second=68.066M/s ``` #### Example (column reading) Before: ``` --------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------------------------------- BM_ReadColumn<false,BooleanType>/-1/0 6456731 ns 6453510 ns 108 bytes_per_second=1.51323Gi/s items_per_second=1.62482G/s BM_ReadColumn<false,BooleanType>/1/20 19012505 ns 19006068 ns 36 bytes_per_second=526.148Mi/s items_per_second=551.706M/s BM_ReadColumn<true,BooleanType>/-1/1 58365426 ns 58251529 ns 12 bytes_per_second=171.669Mi/s items_per_second=180.008M/s BM_ReadColumn<true,BooleanType>/5/10 46498966 ns 46442191 ns 15 bytes_per_second=215.321Mi/s items_per_second=225.781M/s BM_ReadIndividualRowGroups 29617575 ns 29600557 ns 24 bytes_per_second=2.63931Gi/s items_per_second=2.83394G/s BM_ReadMultipleRowGroups 47416980 ns 47288951 ns 15 bytes_per_second=1.65208Gi/s items_per_second=1.7739G/s BM_ReadMultipleRowGroupsGenerator 29741012 ns 29722112 ns 24 bytes_per_second=2.62851Gi/s items_per_second=2.82235G/s ``` After: ``` --------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------------------------------- BM_ReadColumn<false,BooleanType>/-1/0 6438249 ns 6435159 ns 109 bytes_per_second=194.245Mi/s items_per_second=1.62945G/s BM_ReadColumn<false,BooleanType>/1/20 19427495 ns 19419378 ns 37 bytes_per_second=64.3687Mi/s items_per_second=539.964M/s BM_ReadColumn<true,BooleanType>/-1/1 58342877 ns 58298236 ns 12 bytes_per_second=21.4415Mi/s items_per_second=179.864M/s BM_ReadColumn<true,BooleanType>/5/10 46591584 ns 46532288 ns 15 bytes_per_second=26.8631Mi/s items_per_second=225.344M/s BM_ReadIndividualRowGroups 30039049 ns 30021676 ns 23 bytes_per_second=2.60229Gi/s items_per_second=349.273M/s BM_ReadMultipleRowGroups 47877663 ns 47650438 ns 15 bytes_per_second=1.63954Gi/s items_per_second=220.056M/s BM_ReadMultipleRowGroupsGenerator 30377987 ns 30360019 ns 23 bytes_per_second=2.57329Gi/s items_per_second=345.381M/s ``` ### Are these changes tested? Manually by running benchmarks. ### Are there any user-facing changes? No, but this breaks historical comparisons in continuous benchmarking. * GitHub Issue: apache#44081 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug, including details regarding any error messages, version, and platform.
See e.g. for reading and writing booleans:
Component(s)
Benchmarking, C++, Parquet
The text was updated successfully, but these errors were encountered: