Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
GH-44081: [C++][Parquet] Fix reported metrics in parquet-arrow-reader…
…-writer-benchmark (#44082) ### Rationale for this change 1. items/sec and bytes/sec were set to the same value in some benchmarks 2. bytes/sec was incorrectly computed for boolean columns ### What changes are included in this PR? Fix parquet-arrow-reader-writer-benchmark to report correct metrics. #### Example (column writing) Before: ``` -------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------------------------- BM_WriteColumn<false,Int32Type> 43138428 ns 43118609 ns 15 bytes_per_second=927.674Mi/s items_per_second=972.736M/s BM_WriteColumn<true,Int32Type> 150528627 ns 150480597 ns 5 bytes_per_second=265.815Mi/s items_per_second=278.727M/s BM_WriteColumn<false,Int64Type> 49243514 ns 49214955 ns 14 bytes_per_second=1.58742Gi/s items_per_second=1.70448G/s BM_WriteColumn<true,Int64Type> 151526550 ns 151472832 ns 5 bytes_per_second=528.148Mi/s items_per_second=553.803M/s BM_WriteColumn<false,DoubleType> 59101372 ns 59068058 ns 12 bytes_per_second=1.32263Gi/s items_per_second=1.42016G/s BM_WriteColumn<true,DoubleType> 159944872 ns 159895095 ns 4 bytes_per_second=500.328Mi/s items_per_second=524.632M/s BM_WriteColumn<false,BooleanType> 32855604 ns 32845322 ns 21 bytes_per_second=304.457Mi/s items_per_second=319.247M/s BM_WriteColumn<true,BooleanType> 150566118 ns 150528329 ns 5 bytes_per_second=66.4327Mi/s items_per_second=69.6597M/s ``` After: ``` Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------------------------- BM_WriteColumn<false,Int32Type> 43919180 ns 43895926 ns 16 bytes_per_second=911.246Mi/s items_per_second=238.878M/s BM_WriteColumn<true,Int32Type> 153981290 ns 153929841 ns 5 bytes_per_second=259.859Mi/s items_per_second=68.1204M/s BM_WriteColumn<false,Int64Type> 49906105 ns 49860098 ns 14 bytes_per_second=1.56688Gi/s items_per_second=210.304M/s BM_WriteColumn<true,Int64Type> 154273499 ns 154202319 ns 5 bytes_per_second=518.799Mi/s items_per_second=68M/s BM_WriteColumn<false,DoubleType> 59789490 ns 59733498 ns 12 bytes_per_second=1.30789Gi/s items_per_second=175.542M/s BM_WriteColumn<true,DoubleType> 161235860 ns 161169670 ns 4 bytes_per_second=496.371Mi/s items_per_second=65.0604M/s BM_WriteColumn<false,BooleanType> 32962097 ns 32950864 ns 21 bytes_per_second=37.9353Mi/s items_per_second=318.224M/s BM_WriteColumn<true,BooleanType> 154103499 ns 154052873 ns 5 bytes_per_second=8.1141Mi/s items_per_second=68.066M/s ``` #### Example (column reading) Before: ``` --------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------------------------------- BM_ReadColumn<false,BooleanType>/-1/0 6456731 ns 6453510 ns 108 bytes_per_second=1.51323Gi/s items_per_second=1.62482G/s BM_ReadColumn<false,BooleanType>/1/20 19012505 ns 19006068 ns 36 bytes_per_second=526.148Mi/s items_per_second=551.706M/s BM_ReadColumn<true,BooleanType>/-1/1 58365426 ns 58251529 ns 12 bytes_per_second=171.669Mi/s items_per_second=180.008M/s BM_ReadColumn<true,BooleanType>/5/10 46498966 ns 46442191 ns 15 bytes_per_second=215.321Mi/s items_per_second=225.781M/s BM_ReadIndividualRowGroups 29617575 ns 29600557 ns 24 bytes_per_second=2.63931Gi/s items_per_second=2.83394G/s BM_ReadMultipleRowGroups 47416980 ns 47288951 ns 15 bytes_per_second=1.65208Gi/s items_per_second=1.7739G/s BM_ReadMultipleRowGroupsGenerator 29741012 ns 29722112 ns 24 bytes_per_second=2.62851Gi/s items_per_second=2.82235G/s ``` After: ``` --------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------------------------------- BM_ReadColumn<false,BooleanType>/-1/0 6438249 ns 6435159 ns 109 bytes_per_second=194.245Mi/s items_per_second=1.62945G/s BM_ReadColumn<false,BooleanType>/1/20 19427495 ns 19419378 ns 37 bytes_per_second=64.3687Mi/s items_per_second=539.964M/s BM_ReadColumn<true,BooleanType>/-1/1 58342877 ns 58298236 ns 12 bytes_per_second=21.4415Mi/s items_per_second=179.864M/s BM_ReadColumn<true,BooleanType>/5/10 46591584 ns 46532288 ns 15 bytes_per_second=26.8631Mi/s items_per_second=225.344M/s BM_ReadIndividualRowGroups 30039049 ns 30021676 ns 23 bytes_per_second=2.60229Gi/s items_per_second=349.273M/s BM_ReadMultipleRowGroups 47877663 ns 47650438 ns 15 bytes_per_second=1.63954Gi/s items_per_second=220.056M/s BM_ReadMultipleRowGroupsGenerator 30377987 ns 30360019 ns 23 bytes_per_second=2.57329Gi/s items_per_second=345.381M/s ``` ### Are these changes tested? Manually by running benchmarks. ### Are there any user-facing changes? No, but this breaks historical comparisons in continuous benchmarking. * GitHub Issue: #44081 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
- Loading branch information