Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI][microbenchmarks] Add onednn backend to GEMM with transposed matrices #2445

Merged
merged 15 commits into from
Oct 10, 2024

Conversation

Egor-Krivov
Copy link
Contributor

@Egor-Krivov Egor-Krivov commented Oct 9, 2024

Without BENCHMARKING_METHOD="ELAPSED_TIME" I get ~6000TFLOPS GeoMean for onednn measurements.

Closes #2456

@Egor-Krivov
Copy link
Contributor Author

@Egor-Krivov Egor-Krivov marked this pull request as ready for review October 9, 2024 16:14
@Egor-Krivov Egor-Krivov changed the title [CI][benchmarks] Add onednn backend to GEMM with transposed matrices [CI][microbenchmarks] Add onednn backend to GEMM with transposed matrices Oct 9, 2024
@whitneywhtsang
Copy link
Contributor

@vlad-penkin vlad-penkin linked an issue Oct 10, 2024 that may be closed by this pull request
@Egor-Krivov
Copy link
Contributor Author

f6a4dda removed fast_flush option from triton

@whitneywhtsang
Copy link
Contributor

Without BENCHMARKING_METHOD="ELAPSED_TIME" I get ~6000TFLOPS GeoMean for onednn measurements.

Are you saying that the current default timing method IPEX would give incorrect results?

@Egor-Krivov
Copy link
Contributor Author

Without BENCHMARKING_METHOD="ELAPSED_TIME" I get ~6000TFLOPS GeoMean for onednn measurements.

Are you saying that the current default timing method IPEX would give incorrect results?

Yes, without ELAPSED_TIME, combined with onednn backend gives incorrect results. Probably wrong kernel is picked.
But in this PR there is no issue, since we measure onednn with ELAPSED_TIME

@Egor-Krivov Egor-Krivov enabled auto-merge (squash) October 10, 2024 12:40
@whitneywhtsang
Copy link
Contributor

Probably wrong kernel is picked.

hmm... kernel_name is not used in do_bench_ipex.

@Egor-Krivov
Copy link
Contributor Author

Probably wrong kernel is picked.

hmm... kernel_name is not used in do_bench_ipex.

I'm not talking about kernel_name variable. There is some logic extracting kernels on lines 114:120.

@Egor-Krivov
Copy link
Contributor Author

@whitneywhtsang
We'll probably need to remove remaining usage of fast_flush from benchmark_testing.py in the future.

@whitneywhtsang
Copy link
Contributor

@whitneywhtsang We'll probably need to remove remaining usage of fast_flush from benchmark_testing.py in the future.

Sure, I can add a PR for that.

@Egor-Krivov
Copy link
Contributor Author

@whitneywhtsang Everything works. Could you approve it now?

@@ -162,21 +162,23 @@ jobs:
if: ${{ steps.install.outcome == 'success' && !cancelled() }}
run: |
cd benchmarks/triton_kernels_benchmark
TRANSPOSE_B=1 python gemm_benchmark.py --reports $REPORTS
BENCHMARKING_METHOD="ELAPSED_TIME" TRANSPOSE_B=1 python gemm_benchmark.py --reports $REPORTS
Copy link
Contributor

@anmyachev anmyachev Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also change the default measurement method even if IPEX is set for xetla and triton.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Otherwise we get wrong measurements for onednn.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Otherwise we get wrong measurements for onednn.

For legacy profiler or upstream profiler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~6000 TFLOPs for legacy
Last time I run onednn with upstream profiler it just failed. No kernel was found

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, btw you could use kernel_name=gemm_kernel for upstream profiler - it should work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~6000 TFLOPs for legacy

As for this, I suggest using the function explicitly do_bench_elapsed_time if the profiler legacy mode is set, with a comment what is workaround is needed, since it gives invalid numbers. Something like this:

    if provider == 'onednn':
        do_bench = benchmark_suit.do_bench
        if BENCHMARKING_MODE == 'PYTORCH_LEGACY_PROFILER_USING_IPEX':
            # comment here
            do_bench = benchmark_suit.do_bench_elapsed_time
        _, min_ms, max_ms, mean_ms, cv = do_bench(lambda: torch.matmul(torch_a, torch_b), warmup=10,
                                                                 rep=10, quantiles=quantiles, kernel_name="gemm_kernel")

This way you won't change the numbers that are used in dashboards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let me try that

@whitneywhtsang
Copy link
Contributor

@whitneywhtsang We'll probably need to remove remaining usage of fast_flush from benchmark_testing.py in the future.

Sure, I can add a PR for that.

Done in #2460.

Comment on lines 7 to 8
BENCHMARKING_METHOD = os.getenv(
"BENCHMARKING_METHOD", "PYTORCH_LEGACY_PROFILER_USING_IPEX" if USE_IPEX_OPTION else "UPSTREAM_PYTORCH_PROFILER")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main idea is that USE_IPEX_OPTION should have priority over BENCHMARKING_METHOD.

Copy link
Contributor

@anmyachev anmyachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! @whitneywhtsang ?

@Egor-Krivov Egor-Krivov merged commit cf98f3a into main Oct 10, 2024
5 checks passed
@Egor-Krivov Egor-Krivov deleted the egor/gemm_onednn branch October 10, 2024 17:17
@whitneywhtsang
Copy link
Contributor

LGTM! @whitneywhtsang ?

Thanks for asking! Mostly LGTM, one concern is as we are using oneDNN as baseline to compare with Triton, would using two different benchmarking methods to compare affect the comparison?
I assume we will soon change default to use elapsed time with 2025.0? If so, then the problem I mentioned would be temporary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Benchmarks] Add onednn backend to GEMM with transposed matrices
3 participants