[CI][microbenchmarks] Add onednn backend to GEMM with transposed matrices #2445

Egor-Krivov · 2024-10-09T11:33:07Z

Without BENCHMARKING_METHOD="ELAPSED_TIME" I get ~6000TFLOPS GeoMean for onednn measurements.

Closes #2456

…-backend-for-triton into egor/gemm_onednn

Egor-Krivov · 2024-10-09T15:47:35Z

Test run https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11258504227

whitneywhtsang · 2024-10-09T19:18:04Z

Test run https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11258504227

Please fix the CI failure.

Egor-Krivov · 2024-10-10T11:24:04Z

f6a4dda removed fast_flush option from triton

Egor-Krivov · 2024-10-10T12:24:58Z

Test run https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11258504227

Please fix the CI failure.

Done https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11273309867/job/31350042844?pr=2445

whitneywhtsang · 2024-10-10T12:37:25Z

Without BENCHMARKING_METHOD="ELAPSED_TIME" I get ~6000TFLOPS GeoMean for onednn measurements.

Are you saying that the current default timing method IPEX would give incorrect results?

Egor-Krivov · 2024-10-10T12:39:43Z

Without BENCHMARKING_METHOD="ELAPSED_TIME" I get ~6000TFLOPS GeoMean for onednn measurements.

Are you saying that the current default timing method IPEX would give incorrect results?

Yes, without ELAPSED_TIME, combined with onednn backend gives incorrect results. Probably wrong kernel is picked.
But in this PR there is no issue, since we measure onednn with ELAPSED_TIME

whitneywhtsang · 2024-10-10T12:45:53Z

Probably wrong kernel is picked.

hmm... kernel_name is not used in do_bench_ipex.

Egor-Krivov · 2024-10-10T12:52:50Z

Probably wrong kernel is picked.

hmm... kernel_name is not used in do_bench_ipex.

I'm not talking about kernel_name variable. There is some logic extracting kernels on lines 114:120.

Egor-Krivov · 2024-10-10T12:57:23Z

@whitneywhtsang
We'll probably need to remove remaining usage of fast_flush from benchmark_testing.py in the future.

whitneywhtsang · 2024-10-10T12:59:22Z

@whitneywhtsang We'll probably need to remove remaining usage of fast_flush from benchmark_testing.py in the future.

Sure, I can add a PR for that.

Egor-Krivov · 2024-10-10T13:37:14Z

@whitneywhtsang Everything works. Could you approve it now?

anmyachev · 2024-10-10T13:42:54Z

.github/workflows/triton-benchmarks.yml

@@ -162,21 +162,23 @@ jobs:
        if: ${{ steps.install.outcome == 'success' && !cancelled() }}
        run: |
          cd benchmarks/triton_kernels_benchmark
-          TRANSPOSE_B=1 python gemm_benchmark.py --reports $REPORTS
+          BENCHMARKING_METHOD="ELAPSED_TIME" TRANSPOSE_B=1 python gemm_benchmark.py --reports $REPORTS


This will also change the default measurement method even if IPEX is set for xetla and triton.

Yes. Otherwise we get wrong measurements for onednn.

Yes. Otherwise we get wrong measurements for onednn.

For legacy profiler or upstream profiler?

~6000 TFLOPs for legacy
Last time I run onednn with upstream profiler it just failed. No kernel was found

I see, btw you could use kernel_name=gemm_kernel for upstream profiler - it should work.

~6000 TFLOPs for legacy

As for this, I suggest using the function explicitly do_bench_elapsed_time if the profiler legacy mode is set, with a comment what is workaround is needed, since it gives invalid numbers. Something like this:

if provider == 'onednn': do_bench = benchmark_suit.do_bench if BENCHMARKING_MODE == 'PYTORCH_LEGACY_PROFILER_USING_IPEX': # comment here do_bench = benchmark_suit.do_bench_elapsed_time _, min_ms, max_ms, mean_ms, cv = do_bench(lambda: torch.matmul(torch_a, torch_b), warmup=10, rep=10, quantiles=quantiles, kernel_name="gemm_kernel")

This way you won't change the numbers that are used in dashboards.

Ok, let me try that

whitneywhtsang · 2024-10-10T15:48:04Z

@whitneywhtsang We'll probably need to remove remaining usage of fast_flush from benchmark_testing.py in the future.

Sure, I can add a PR for that.

Done in #2460.

anmyachev · 2024-10-10T16:19:26Z

benchmarks/triton_kernels_benchmark/benchmark_testing.py

+BENCHMARKING_METHOD = os.getenv(
+    "BENCHMARKING_METHOD", "PYTORCH_LEGACY_PROFILER_USING_IPEX" if USE_IPEX_OPTION else "UPSTREAM_PYTORCH_PROFILER")


Please revert this

The main idea is that USE_IPEX_OPTION should have priority over BENCHMARKING_METHOD.

benchmarks/triton_kernels_benchmark/gemm_benchmark.py

…-backend-for-triton into egor/gemm_onednn

anmyachev

LGTM! @whitneywhtsang ?

whitneywhtsang · 2024-10-10T18:05:13Z

LGTM! @whitneywhtsang ?

Thanks for asking! Mostly LGTM, one concern is as we are using oneDNN as baseline to compare with Triton, would using two different benchmarking methods to compare affect the comparison?
I assume we will soon change default to use elapsed time with 2025.0? If so, then the problem I mentioned would be temporary.

Egor-Krivov and others added 5 commits October 8, 2024 15:20

Add onednn backend

f99a714

Update triton-benchmarks.yml

8ca234e

enabled env variable for ipex

43bdfc6

Merge branch 'egor/gemm_onednn' of https://github.com/intel/intel-xpu…

15271b7

…-backend-for-triton into egor/gemm_onednn

Merge branch 'main' into egor/gemm_onednn

ec95dea

Egor-Krivov requested review from anmyachev and whitneywhtsang October 9, 2024 16:14

Egor-Krivov marked this pull request as ready for review October 9, 2024 16:14

Egor-Krivov changed the title ~~[CI][benchmarks] Add onednn backend to GEMM with transposed matrices~~ [CI][microbenchmarks] Add onednn backend to GEMM with transposed matrices Oct 9, 2024

vlad-penkin linked an issue Oct 10, 2024 that may be closed by this pull request

[Benchmarks] Add onednn backend to GEMM with transposed matrices #2456

Closed

Merge branch 'main' into egor/gemm_onednn

5a7d124

Egor-Krivov added 2 commits October 10, 2024 11:37

Removed outdated param

872240a

codestyle

a1e2643

Egor-Krivov and others added 2 commits October 10, 2024 14:25

Merge branch 'main' into egor/gemm_onednn

2f091c7

codestyle

0622da7

Egor-Krivov enabled auto-merge (squash) October 10, 2024 12:40

anmyachev reviewed Oct 10, 2024

View reviewed changes

onednn-specific config

130c6bb

Merge branch 'main' into egor/gemm_onednn

138cada

Egor-Krivov requested a review from anmyachev October 10, 2024 16:12

anmyachev reviewed Oct 10, 2024

View reviewed changes

benchmarks/triton_kernels_benchmark/gemm_benchmark.py Show resolved Hide resolved

Egor-Krivov added 3 commits October 10, 2024 16:36

Reverted changes

5a5bd11

Merge remote-tracking branch 'origin/main' into egor/gemm_onednn

9c6c854

Merge branch 'egor/gemm_onednn' of https://github.com/intel/intel-xpu…

7a14d1a

…-backend-for-triton into egor/gemm_onednn

Egor-Krivov requested a review from anmyachev October 10, 2024 16:39

anmyachev approved these changes Oct 10, 2024

View reviewed changes

Egor-Krivov merged commit cf98f3a into main Oct 10, 2024
5 checks passed

Egor-Krivov deleted the egor/gemm_onednn branch October 10, 2024 17:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI][microbenchmarks] Add onednn backend to GEMM with transposed matrices #2445

[CI][microbenchmarks] Add onednn backend to GEMM with transposed matrices #2445

Egor-Krivov commented Oct 9, 2024 •

edited

Loading

Egor-Krivov commented Oct 9, 2024

whitneywhtsang commented Oct 9, 2024

Egor-Krivov commented Oct 10, 2024

Egor-Krivov commented Oct 10, 2024

whitneywhtsang commented Oct 10, 2024

Egor-Krivov commented Oct 10, 2024

whitneywhtsang commented Oct 10, 2024

Egor-Krivov commented Oct 10, 2024

Egor-Krivov commented Oct 10, 2024

whitneywhtsang commented Oct 10, 2024

Egor-Krivov commented Oct 10, 2024

anmyachev Oct 10, 2024 •

edited

Loading

Egor-Krivov Oct 10, 2024

anmyachev Oct 10, 2024

Egor-Krivov Oct 10, 2024

anmyachev Oct 10, 2024

anmyachev Oct 10, 2024

Egor-Krivov Oct 10, 2024

whitneywhtsang commented Oct 10, 2024

anmyachev Oct 10, 2024

anmyachev Oct 10, 2024

anmyachev left a comment

whitneywhtsang commented Oct 10, 2024

		BENCHMARKING_METHOD = os.getenv(
		"BENCHMARKING_METHOD", "PYTORCH_LEGACY_PROFILER_USING_IPEX" if USE_IPEX_OPTION else "UPSTREAM_PYTORCH_PROFILER")

[CI][microbenchmarks] Add onednn backend to GEMM with transposed matrices #2445

[CI][microbenchmarks] Add onednn backend to GEMM with transposed matrices #2445

Conversation

Egor-Krivov commented Oct 9, 2024 • edited Loading

Egor-Krivov commented Oct 9, 2024

whitneywhtsang commented Oct 9, 2024

Egor-Krivov commented Oct 10, 2024

Egor-Krivov commented Oct 10, 2024

whitneywhtsang commented Oct 10, 2024

Egor-Krivov commented Oct 10, 2024

whitneywhtsang commented Oct 10, 2024

Egor-Krivov commented Oct 10, 2024

Egor-Krivov commented Oct 10, 2024

whitneywhtsang commented Oct 10, 2024

Egor-Krivov commented Oct 10, 2024

anmyachev Oct 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whitneywhtsang commented Oct 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anmyachev left a comment

Choose a reason for hiding this comment

whitneywhtsang commented Oct 10, 2024

Egor-Krivov commented Oct 9, 2024 •

edited

Loading

anmyachev Oct 10, 2024 •

edited

Loading