Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 3.5: Adapt DeleteFileIndexBenchmark for DVs #11529

Merged
merged 1 commit into from
Nov 15, 2024

Conversation

aokolnychyi
Copy link
Contributor

@aokolnychyi aokolnychyi commented Nov 12, 2024

This PR adapts our DeleteFileIndexBenchmark for DVs.

Benchmark                                                           (type)  Mode  Cnt            Score         Error   Units
DeleteFileIndexBenchmark.buildIndexAndLookup                     partition    ss   10            0.475 ±       0.031    s/op
DeleteFileIndexBenchmark.buildIndexAndLookup                          file    ss   10            5.381 ±       0.224    s/op
DeleteFileIndexBenchmark.buildIndexAndLookup                            dv    ss   10            3.612 ±       0.201    s/op

The reason partition-scoped deletes are fastest is because the benchmark sets up a table with a small number of deep partitions (50K data files per partition) and only 100 delete files per partition. Therefore, the number of delete files differs dramatically. We should probably make this benchmark more representative in the future. DVs are faster than file-scoped deletes because they rely on referencedDataFile instead of reconstructing that value from bounds. I'd say the planning performance is acceptable for 2.5M DVs, but we may want to further optimize it.

This work is part of #11122.

@github-actions github-actions bot added the spark label Nov 12, 2024
Copy link
Member

@jbonofre jbonofre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use an abstract class gathering the @Param and init() methods that we share across benchmark tests.

@aokolnychyi aokolnychyi merged commit 315e154 into apache:main Nov 15, 2024
31 checks passed
@aokolnychyi
Copy link
Contributor Author

Thanks, @jbonofre @nastra!

We may look into refactoring some of the benchmark code, but the experience shows it is rarely worth the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants