Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update speedup factors for qualification tool in Dataproc 2.1 environments #509

Merged
merged 3 commits into from
Aug 29, 2023

Conversation

cindyyuanjiang
Copy link
Collaborator

@cindyyuanjiang cindyyuanjiang commented Aug 23, 2023

We reran NDS benchmarks in Dataproc 2.1 environments with latest versions, and updated the speedup factors for Dataproc 2.1 T4 and L4 env.

Fixes #508

Signed-off-by: cindyyuanjiang <[email protected]>
@cindyyuanjiang cindyyuanjiang self-assigned this Aug 23, 2023
@cindyyuanjiang cindyyuanjiang added feature request New feature or request core_tools Scope the core module (scala) labels Aug 23, 2023
@amahussein amahussein self-requested a review August 24, 2023 15:19
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot see which tests failed.
@cindyyuanjiang can you please rerun the unit tests locally for spark 341 to see what is going on.

Signed-off-by: cindyyuanjiang <[email protected]>
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the ML ops added to the t4/l4 were copy/paste from the operatorsScore.csv. Was that done on purpose? AFAIK, the ML-ops are not parts of the benchmarks.
I am concerned that we update the file skipping some operators that were not actually part of the evaluation.

  • If the ML-functions is added to the file, then teveryone is assuming the speedups are up-to-date.
  • If the speedups for ML-functions are not measured on t4/l4, then they should not be added to the file.

Finally, we should consider upgrading the default build to >= Spark-330 since:

  1. speedups are based on dataproc-2.1 which runs Spark330.
  2. user-tools downloads spark-3.3.1 by default.

Prefer a second opinion from @nartal1

@nartal1
Copy link
Collaborator

nartal1 commented Aug 28, 2023

I second that speedups for ML ops should not be added to t4/l4. It would imply that we got these numbers by testing and running on t4/l4. The speedups were obtained from databricks-aws.

Signed-off-by: cindyyuanjiang <[email protected]>
@cindyyuanjiang
Copy link
Collaborator Author

@amahussein @nartal1
Thank you very much for pointing this out! The ML op scores were generated from the script automatically. I have removed them from t4/l4 operator score files.

@amahussein amahussein merged commit f70a4f7 into NVIDIA:dev Aug 29, 2023
8 checks passed
@cindyyuanjiang cindyyuanjiang deleted the dataproc-speedup-factor branch August 29, 2023 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core_tools Scope the core module (scala) feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Reset speedup factors for qualification tool in Dataproc 2.1 environment
3 participants