Spark 3.5: Adapt DeleteFileIndexBenchmark for DVs #11529

aokolnychyi · 2024-11-12T20:32:40Z

This PR adapts our DeleteFileIndexBenchmark for DVs.

Benchmark                                                           (type)  Mode  Cnt            Score         Error   Units
DeleteFileIndexBenchmark.buildIndexAndLookup                     partition    ss   10            0.475 ±       0.031    s/op
DeleteFileIndexBenchmark.buildIndexAndLookup                          file    ss   10            5.381 ±       0.224    s/op
DeleteFileIndexBenchmark.buildIndexAndLookup                            dv    ss   10            3.612 ±       0.201    s/op

The reason partition-scoped deletes are fastest is because the benchmark sets up a table with a small number of deep partitions (50K data files per partition) and only 100 delete files per partition. Therefore, the number of delete files differs dramatically. We should probably make this benchmark more representative in the future. DVs are faster than file-scoped deletes because they rely on referencedDataFile instead of reconstructing that value from bounds. I'd say the planning performance is acceptable for 2.5M DVs, but we may want to further optimize it.

This work is part of #11122.

jbonofre

Maybe we can use an abstract class gathering the @Param and init() methods that we share across benchmark tests.

aokolnychyi · 2024-11-15T21:14:16Z

Thanks, @jbonofre @nastra!

We may look into refactoring some of the benchmark code, but the experience shows it is rarely worth the time.

Spark 3.5: Adapt DeleteFileIndexBenchmark for DVs

f7ca2ea

github-actions bot added the spark label Nov 12, 2024

jbonofre approved these changes Nov 13, 2024

View reviewed changes

nastra approved these changes Nov 14, 2024

View reviewed changes

aokolnychyi merged commit 315e154 into apache:main Nov 15, 2024
31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark 3.5: Adapt DeleteFileIndexBenchmark for DVs #11529

Spark 3.5: Adapt DeleteFileIndexBenchmark for DVs #11529

aokolnychyi commented Nov 12, 2024 •

edited

Loading

jbonofre left a comment

aokolnychyi commented Nov 15, 2024

Spark 3.5: Adapt DeleteFileIndexBenchmark for DVs #11529

Spark 3.5: Adapt DeleteFileIndexBenchmark for DVs #11529

Conversation

aokolnychyi commented Nov 12, 2024 • edited Loading

jbonofre left a comment

Choose a reason for hiding this comment

aokolnychyi commented Nov 15, 2024

aokolnychyi commented Nov 12, 2024 •

edited

Loading