GitHub - pedrovalerolara/TF32-TF64: TF32-TF64

Abstract

With the introduction of high-performance AI cores in new GPUs, such as NVIDIA's Tensor Cores, AMD's Matrix Core, and ARM SME, AI applications have seen large performance increases in accelerating GEMM. We want to extend the successes of AI cores into high-performance computing for scientific applications. We present an extension to NVIDIA's TF32 mixed precision framework, called DSGEMM for double precision GEMM. The TF32 and DSGEMM mixed precision frameworks work by reducing the precision of the input data. The TF32 mixed precision framework in particular sees an 8x performance increase as compared to SGEMM. Using the DSGEMM framework with the mixed precision method described in [1], we can achieve near double precision accuracy. Currently, there is no hardware support for our proposed framework but we expect to see similar performance results: with a potential 2.6x performance acceleration while maintaining near double precision accuracy.

References

[1] M. Fasi, N. J. Higham, F. Lopez, T. Mary, and M. Mikaitis, “Matrix multiplication in multiword arithmetic: Error analysis and application to gpu tensor cores,” 2022, MIMS Preprint. [Online]. Available: http://eprints.maths.manchester.ac.uk/id/eprint/286

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
plots/gemm		plots/gemm
scripts		scripts
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Abstract

References

About

Releases

Packages

Languages

pedrovalerolara/TF32-TF64

Folders and files

Latest commit

History

Repository files navigation

Abstract

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages