Attendees
- Victor Lu
- Dave Murray (Imagination Technologies)
- En Shao (Institute of Computing Technology)
- Chao Chen (Intel)
- Tao Lv (Intel)
- Leping Wang
- Ruyman Reyes Castro (Codeplay)
- Keeran Rothenfusser (Rivos)
- Hengyu Meng (Intel)
- Denis Samoilov (Intel)
- Penporn Koanantakool (Google)
- Rod Burns (Codeplay)
- Chris Reed (GE HealthCare)
- Ed Tuke
- Jose Fonseca
- Lixian Ma
- jiangang duan (Intel)
- Mehdi Goli (Codeplay)
- Jianhui Li (Intel)
- Gilberto Rodriguez (OpenChip)
- Mourad Gouicem (Intel)
- Alison Richards (Intel)
- Melissa Aranzamendez (Linux Foundation)
- Yong Li (Intel)
Agenda
“SYCL backend for llama.cpp” Hengyu Meng, Intel (slides) “oneDNN for Generic GPU” Denis Samoylov, Intel (slides)
Question: Llama.cpp uses low-level assembly code like WMMA to write flash attention kernels. Can SYCL support that?
Answer: Yes. Joint-matrix API support the same level of abstraction as WMMA, would be good to know what specific feature is missing. If you refer to mma instructions, you may have to use inline assembly, and SYCL also supports that.
Question: In case a vendor has a performance library for AI hardware and likes to use oneDNN API, what would be the best way to contribute?
Answer: oneDNN is an open-source project that has contribution guidelines and defines an RFC process. Please refer to slide 6 for the detailed contribution process.