-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update speedup factors for qualification tool in Dataproc 2.1 environments #509
Update speedup factors for qualification tool in Dataproc 2.1 environments #509
Conversation
Signed-off-by: cindyyuanjiang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot see which tests failed.
@cindyyuanjiang can you please rerun the unit tests locally for spark 341 to see what is going on.
Signed-off-by: cindyyuanjiang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the ML ops added to the t4/l4 were copy/paste from the operatorsScore.csv
. Was that done on purpose? AFAIK, the ML-ops are not parts of the benchmarks.
I am concerned that we update the file skipping some operators that were not actually part of the evaluation.
- If the ML-functions is added to the file, then teveryone is assuming the speedups are up-to-date.
- If the speedups for ML-functions are not measured on t4/l4, then they should not be added to the file.
Finally, we should consider upgrading the default build to >= Spark-330 since:
- speedups are based on dataproc-2.1 which runs Spark330.
- user-tools downloads spark-3.3.1 by default.
Prefer a second opinion from @nartal1
I second that speedups for ML ops should not be added to t4/l4. It would imply that we got these numbers by testing and running on t4/l4. The speedups were obtained from databricks-aws. |
Signed-off-by: cindyyuanjiang <[email protected]>
@amahussein @nartal1 |
We reran NDS benchmarks in Dataproc 2.1 environments with latest versions, and updated the speedup factors for Dataproc 2.1 T4 and L4 env.
Fixes #508