Options/opinions for reduced precision FP types #572

abmerop · 2023-11-18T20:01:25Z

abmerop
Nov 18, 2023
Maintainer

For the next-next-release (i.e., v24.0) I am attempting to bump the GPU ISA to support the new instructions up to and including MI200 and maybe MI300 if the spec comes out in the next few weeks. As part of that, there are some of float16/bfloat16 instructions that would need software support in gem5 to execute since they are not native C++ types. (See section 12.10 here: https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/instinct-mi200-cdna2-instruction-set-architecture.pdf)

There are a few options here so I wanted to start a discussion on what others think before I get too much further. This could potentially be beneficial outside the GPU, such as x86 AVX512-xyz instructions and I'm sure ARM / RISC-V folks have or are considering instructions which support reduced precision FP types.

At a high-level the options look like this:

Roll our own: I don't like this idea at all really, but avoids potential license issues
C++23 fixed width types: I also don't like this considering we are still on C++17 and we want to support older systems
Use a 3rd party library: I won't list the many available, but a question for probably @powerjg would be what kind of licenses could be included in gem5? The libraries I am looking at so far have MIT and LGPL licenses

Another question would be where to put a 3rd party library if we go that route. Traditionally we place these kind of things in ext/ or like the recent Capstone contribution we check for an install in scons (KConfig?) and optionally build it in. However, some options are C++ header-only libraries which would have to be compiled in.

Lastly, I don't want to limit this discussion to only fp16/bf16. Since gem5 is a simulator we could enable exploration of custom data types and new instructions that don't yet exist. For example, the recent Open Compute Project Microscaling Formats could be a good baseline: (https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf). Some 3rd party libraries already support this.

I am not planning to commit to anything immediately, but wanted to start collecting thoughts before doing another code dump :).

mattsinc · 2023-11-18T23:36:15Z

mattsinc
Nov 18, 2023
Maintainer

Maybe I'm missing something, by why would we need libraries or roll our own? I assume a language (e.g., HIP) would have support for reduced precision instructions in their instructions/in libraries like rocBLAS. So what is the extra step needed for?

0 replies

abmerop · 2023-11-19T02:04:08Z

abmerop
Nov 19, 2023
Maintainer Author

This would be part of the gem5 build, so we wouldn't want users/gem5 community to have to have ROCm installed locally*. Basically I am looking for something that would do the math in the instructions.cc file

Using HIP's __half2, etc, is another potential option but it would require #include'ing the hip_fip16.hh file from HIP which would need a ROCm install

Since ROCm 5.x the ROCm team has decoupled the packages a bit where you no longer need to have the ROCk dkms package simply to build GPU apps (slight nuance in util: Bump GPUFS build docker to 5.4.2 #571). In other words, you do not need an AMD gpu to build AMD gpu apps with this apt package

0 replies

andysan · 2023-11-22T18:11:25Z

andysan
Nov 22, 2023
Maintainer

There is already a small library that implements bit-accurate Arm FP support in src/arch/arm/insts/fplib.cc. It supports at least one FP16 format. We could probably use that as a basis for a more general FP library.

3 replies

abmerop Nov 24, 2023
Maintainer Author

Thanks, I did a cursory search through the gem5 source but did not find this. For now it contains everything I am looking for. I was able to build just that file from the ARM directory and validate the GPU programs I am running get the same result as hardware using it.

We could potentially use it as a basis, but my main concern is that could be a hugely redundant effort given there are some open source alternatives available. For example, implementing fp6, fp4, eXmY formats and so on. I do understand that licensing might be a concern though.

andysan Nov 27, 2023
Maintainer

The go-to solution for software floating point in this type of software seems to be SoftFloat. One of the reasons that we didn't include SoftFloat in gem5 before was indeed licensing. However, that changed in version 3e which seems to use a standard BSD-3-Clause license. I don't think it supports any of the new ML floating point proposals though.

abmerop Dec 10, 2023
Maintainer Author

Make sense not to re-write too much. One thing about using the ARM code is the usage of FPSCR register outside of ARM. It has been working for my use case as seen here: abmerop@f1590a7

I haven't PR'd yet as this would be better reviewed for a 24.0v release and I don't want to interrupt the current staging. However, there might be a way to make my use case more generic. In particular I am concerned about the rounding type. I could change the FPSCR bits to deal with that which may work. I am currently ignoring it. Example here: abmerop@32e6189#diff-db81d5e277013c88aee5f4857fca190944d88782d03460ddae34401150fed2b6R283

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gem5

Options/opinions for reduced precision FP types #572

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

gem5

Options/opinions for reduced precision FP types #572

abmerop Nov 18, 2023 Maintainer

Replies: 3 comments · 3 replies

mattsinc Nov 18, 2023 Maintainer

abmerop Nov 19, 2023 Maintainer Author

andysan Nov 22, 2023 Maintainer

abmerop Nov 24, 2023 Maintainer Author

andysan Nov 27, 2023 Maintainer

abmerop Dec 10, 2023 Maintainer Author

abmerop
Nov 18, 2023
Maintainer

Replies: 3 comments 3 replies

mattsinc
Nov 18, 2023
Maintainer

abmerop
Nov 19, 2023
Maintainer Author

andysan
Nov 22, 2023
Maintainer

abmerop Nov 24, 2023
Maintainer Author

andysan Nov 27, 2023
Maintainer

abmerop Dec 10, 2023
Maintainer Author