why kernel size is d6x8 in zen #253

VirtualEarth · 2018-09-19T09:28:48Z

why kernel size is d6x8 in zen？ According to the paper when the micro kernel is GEBP, GEMM will attain best performance, so the kernel size should be d8x6？

devinamatthews · 2018-09-19T13:57:07Z

In that paper, the GEBP part refers to the shape of the algorithm w.r.t. the L2 cache (e.g. the "macrokernel" in BLIS parlance). The part that corresponds to the microkernel in BLIS (a term not explicitly used in that paper) is sections 6.1-6.2. This paper presents a more detailed analysis of the block sizes as used in BLIS.

VirtualEarth · 2018-09-20T07:26:48Z

thx, but i'm not understand it 。the latency and throughput of FMA is 5 , 0.5 in haswell Nvec is 4

so mr is 4 and nr is 4?

fgvanzee · 2018-09-20T23:47:51Z

@VirtualEarth Turn your attention to Eq. 1:

  mr nr >= Nvec Lvfma Nvfma

On Intel Haswell/Broadwell/Skylake/Kabylake, Nvec is 4, Lvfma is 5, and Nvfma is 2. Thus, the register blocksize product mr nr must be at least 40 in order to overcome the floating-point latency. 6 x 8 = 48 more than satisfies this, but 8 x 6 would also be fine. The former is biased toward row-oriented SIMD output while the latter is better for column-oriented SIMD output. In BLIS, it almost never actually matters which one you pick because high-level logic will transpose the entire operation and make the output matrix C appear to be row- or column-stored, depending on the SIMD output "preference" of the microkernel.

VirtualEarth · 2018-09-21T01:27:03Z

@VirtualEarth Turn your attention to Eq. 1:
  mr nr >= Nvec Lvfma Nvfma
On Intel Haswell/Broadwell/Skylake/Kabylake, Nvec is 4, Lvfma is 5, and Nvfma is 2. Thus, the register blocksize product mr nr must be at least 40 in order to overcome the floating-point latency. 6 x 8 = 48 more than satisfies this, but 8 x 6 would also be fine. The former is biased toward row-oriented SIMD output while the latter is better for column-oriented SIMD output. In BLIS, it almost never actually matters which one you pick because high-level logic will transpose the entire operation and make the output matrix C appear to be row- or column-stored, depending on the SIMD output "preference" of the microkernel.

thx，i got it.

devinamatthews · 2018-09-21T13:43:47Z

@VirtualEarth thanks for your interest in BLIS and I hope all of your questions got answered. I'll close the issue now--if you do have more questions then this would make a great thread on the blis-discuss group.

fgvanzee · 2018-09-25T20:40:54Z

@VirtualEarth One more thing: I realized that your issue asks "why kernel size is d6x8 in zen?" A microkernel for Intel Haswell-like architectures happens to work on AMD Zen-based architectures, though not the other way around. (Zen can only execute one 256-bit FMA per cycle, whereas Haswell can execute two, and therefore the microtile size for Zen can be smaller.) I plan to eventually rename the zen kernel set to haswell. Then, we can (optionally) experiment with kernels that specifically target Ryzen/Epyc.

mert-kurttutan · 2023-09-06T07:59:58Z

Hi @devinamatthews ,
Here you mentioned that Lvfma is 5 for Intel Haswell/Broadwell/Skylake/Kabylake. But from the reference of intel intrinsics guide, it seems that
Lvfma is 4. Am I missing something?

devinamatthews · 2023-09-06T15:59:44Z

It was 5 on Haswell, which is the architecture that the kernel was originally written for. The latency dropped to 4 in SKL (I think?) but this doesn't change the basic design of the kernel. Note that the math using Lvfma gives one a minimum kernel size, but it's almost always best to pick a kernel as large as possible to reduce bandwidth from cache.

devinamatthews closed this as completed Sep 21, 2018

SuperFluffy mentioned this issue Dec 4, 2018

Use optimal kernel parameters (architectures, matrix layouts) bluss/matrixmultiply#34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why kernel size is d6x8 in zen #253

why kernel size is d6x8 in zen #253

VirtualEarth commented Sep 19, 2018

devinamatthews commented Sep 19, 2018

VirtualEarth commented Sep 20, 2018

fgvanzee commented Sep 20, 2018

VirtualEarth commented Sep 21, 2018

devinamatthews commented Sep 21, 2018

fgvanzee commented Sep 25, 2018

mert-kurttutan commented Sep 6, 2023

devinamatthews commented Sep 6, 2023

why kernel size is d6x8 in zen #253

why kernel size is d6x8 in zen #253

Comments

VirtualEarth commented Sep 19, 2018

devinamatthews commented Sep 19, 2018

VirtualEarth commented Sep 20, 2018

fgvanzee commented Sep 20, 2018

VirtualEarth commented Sep 21, 2018

devinamatthews commented Sep 21, 2018

fgvanzee commented Sep 25, 2018

mert-kurttutan commented Sep 6, 2023

devinamatthews commented Sep 6, 2023