-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why kernel size is d6x8 in zen #253
Comments
In that paper, the GEBP part refers to the shape of the algorithm w.r.t. the L2 cache (e.g. the "macrokernel" in BLIS parlance). The part that corresponds to the microkernel in BLIS (a term not explicitly used in that paper) is sections 6.1-6.2. This paper presents a more detailed analysis of the block sizes as used in BLIS. |
thx, but i'm not understand it 。the latency and throughput of FMA is 5 , 0.5 in haswell Nvec is 4 so mr is 4 and nr is 4? |
@VirtualEarth Turn your attention to Eq. 1:
On Intel Haswell/Broadwell/Skylake/Kabylake, |
thx,i got it. |
@VirtualEarth thanks for your interest in BLIS and I hope all of your questions got answered. I'll close the issue now--if you do have more questions then this would make a great thread on the blis-discuss group. |
@VirtualEarth One more thing: I realized that your issue asks "why kernel size is d6x8 in zen?" A microkernel for Intel Haswell-like architectures happens to work on AMD Zen-based architectures, though not the other way around. (Zen can only execute one 256-bit FMA per cycle, whereas Haswell can execute two, and therefore the microtile size for Zen can be smaller.) I plan to eventually rename the |
Hi @devinamatthews , |
It was 5 on Haswell, which is the architecture that the kernel was originally written for. The latency dropped to 4 in SKL (I think?) but this doesn't change the basic design of the kernel. Note that the math using Lvfma gives one a minimum kernel size, but it's almost always best to pick a kernel as large as possible to reduce bandwidth from cache. |
why kernel size is d6x8 in zen? According to the paper when the micro kernel is GEBP, GEMM will attain best performance, so the kernel size should be d8x6?
The text was updated successfully, but these errors were encountered: