Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compare.py to compare the output of multiple benchmarks #5655

Merged
merged 5 commits into from
Mar 27, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Mar 20, 2023

Which issue does this PR close?

Closes #5561

Rationale for this change

See #5561

What changes are included in this PR?

  1. compare.py script from @Taza53 based on one from @isidentical (see Report and compare benchmark runs against two branches #5561 (comment))
  2. Updated documentation

Are these changes tested?

Not really,

Are there any user-facing changes?

No

@alamb alamb added the development-process Related to development process of DataFusion label Mar 20, 2023
@alamb alamb changed the title Alamb/compare Add compare.py to compare the output of multiple benchmarks Mar 20, 2023
@github-actions github-actions bot removed the development-process Related to development process of DataFusion label Mar 20, 2023
```shell
$ git checkout main
# generate an output script in /tmp/output_main
$ cargo run --release --bin tpch -- benchmark datafusion --iterations 5 --path /data --format parquet -o /tmp/output_main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think --path /data should be replaced with --path ./data in this line. Also we can change the --format parquet with --format tbl (Assuming user doesn't run the conversion script. This is the format of the output of ./tpch-gen.sh)

$ cargo run --release --bin tpch -- benchmark datafusion --iterations 5 --path /data --format parquet -o /tmp/output_main
# generate an output script in /tmp/output_branch
$ git checkout my_branch
$ cargo run --release --bin tpch -- benchmark datafusion --iterations 5 --path /data --format parquet -o /tmp/output_my_branch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar changes can be applied with above suggestion

Copy link
Contributor Author

@alamb alamb Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for these suggestions, I have made them in dc5099d

```shell
$ git checkout main
# generate an output script in /tmp/output_main
$ cargo run --release --bin tpch -- benchmark datafusion --iterations 5 --path /data --format parquet -o /tmp/output_main
Copy link
Contributor

@mustafasrepo mustafasrepo Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also when I run this script unless /tmp/output_main already exists. I receive IO Error. Is this expected?. If so, I think we should add mkdir /tmp/output_main above this line.

@mustafasrepo
Copy link
Contributor

I added some minor comments. Other than those comments, This PR is LGTM!. Thanks @alamb for this PR. This is very useful to compare results with friendly report.

@alamb
Copy link
Contributor Author

alamb commented Mar 27, 2023

Thanks again for the review @mustafasrepo

@alamb alamb merged commit b4dde57 into apache:main Mar 27, 2023
@alamb alamb deleted the alamb/compare branch March 27, 2023 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Report and compare benchmark runs against two branches
2 participants