Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocmPackages: extend ISA compatibility #298388

Merged
merged 15 commits into from
Apr 29, 2024

Conversation

GZGavinZhao
Copy link
Contributor

@GZGavinZhao GZGavinZhao commented Mar 23, 2024

  • rocmPackages: extend rocm-runtime ISA compatibility
  • rocmPackages: extend clr ISA compatibility
  • rocmPackages: extend tensile ISA compatibility
  • rocmPackages: extend rocblas ISA compatibility
  • rocmPackages: extend miopen ISA compatibility

Description of changes

This will eliminate the need for users to use HSA_OVERRIDE_GFX_VERSION to emulate their GPUs as a supported GPU model.

For more info, see this mailing thread.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.05 Release Notes (or backporting 23.05 and 23.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@mschwaig
Copy link
Member

I think having that artificial barrier removed is a great change.

I just want to give you a heads up after quickly looking over that linked email:

It seems those patches are originally provided as git commits. Unless they need to be adapted somehow form the original commits, it would be better to integrate them by using fetchpatch instead of checking them in, like you can see for example in this change : 9a4f48b

@GZGavinZhao
Copy link
Contributor Author

@mschwaig Updated. Thank you for your comment!

@GZGavinZhao
Copy link
Contributor Author

GZGavinZhao commented Mar 24, 2024

I will leave nixpkgs-review running through the night, and if nothing fails I'll rebase the commits and mark this as ready for review tomorrow.

@GZGavinZhao
Copy link
Contributor Author

Well I guess that had to be 2 nights because it ran out of space building rocblas... Going to finish the rest tonight and hopefully I can update the PR tomorrow.

@GZGavinZhao GZGavinZhao force-pushed the rocm-gfx-compat branch 3 times, most recently from 366adb5 to fa1628b Compare March 27, 2024 02:46
@mschwaig
Copy link
Member

Going by the output of nixpkgs-review it seems like rocblas is still not building yet.

Result of nixpkgs-review pr 298388 run on x86_64-linux 1

14 packages marked as broken and skipped:
  • rocmPackages.llvm.flang
  • rocmPackages.llvm.flang.doc
  • rocmPackages.llvm.flang.info
  • rocmPackages.llvm.flang.man
  • rocmPackages.migraphx
  • rocmPackages.rdc
  • rocmPackages.rdc.doc
  • rocmPackages_6.llvm.flang
  • rocmPackages_6.llvm.flang.doc
  • rocmPackages_6.llvm.flang.info
  • rocmPackages_6.llvm.flang.man
  • rocmPackages_6.migraphx
  • rocmPackages_6.rdc
  • rocmPackages_6.rdc.doc
6 packages failed to build:
  • rocmPackages.hipblas (rocmPackages_6.hipblas)
  • rocmPackages.hipsolver (rocmPackages_6.hipsolver)
  • rocmPackages.miopen (rocmPackages.miopen-hip ,rocmPackages_6.miopen ,rocmPackages_6.miopen-hip)
  • rocmPackages.rocalution (rocmPackages_6.rocalution)
  • rocmPackages.rocblas (rocmPackages_6.rocblas)
  • rocmPackages.rocsolver (rocmPackages_6.rocsolver)
34 packages built:
  • blender-hip
  • rocmPackages.clr (rocmPackages_6.clr)
  • rocmPackages.clr.icd (rocmPackages_6.clr.icd)
  • rocmPackages.composable_kernel (rocmPackages_6.composable_kernel)
  • rocmPackages.hipcub (rocmPackages_6.hipcub)
  • rocmPackages.hipfft (rocmPackages_6.hipfft)
  • rocmPackages.hiprand (rocmPackages.rocrand ,rocmPackages_6.hiprand ,rocmPackages_6.rocrand)
  • rocmPackages.hipsparse (rocmPackages_6.hipsparse)
  • rocmPackages.llvm.mlir (rocmPackages_6.llvm.mlir)
  • rocmPackages.llvm.openmp (rocmPackages_6.llvm.openmp)
  • rocmPackages.llvm.openmp.doc (rocmPackages_6.llvm.openmp.doc)
  • rocmPackages.llvm.openmp.info (rocmPackages_6.llvm.openmp.info)
  • rocmPackages.llvm.openmp.man (rocmPackages_6.llvm.openmp.man)
  • rocmPackages.rccl (rocmPackages_6.rccl)
  • rocmPackages.rocdbgapi (rocmPackages_6.rocdbgapi)
  • rocmPackages.rocdbgapi.doc (rocmPackages_6.rocdbgapi.doc)
  • rocmPackages.rocfft (rocmPackages_6.rocfft)
  • rocmPackages.rocm-runtime (rocmPackages_6.rocm-runtime)
  • rocmPackages.rocminfo (rocmPackages_6.rocminfo)
  • rocmPackages.rocmlir (rocmPackages_6.rocmlir)
  • rocmPackages.rocmlir-rock (rocmPackages_6.rocmlir-rock)
  • rocmPackages.rocmlir.external (rocmPackages_6.rocmlir.external)
  • rocmPackages.rocprim (rocmPackages_6.rocprim)
  • rocmPackages.rocprofiler (rocmPackages_6.rocprofiler)
  • rocmPackages.rocr-debug-agent (rocmPackages_6.rocr-debug-agent)
  • rocmPackages.rocsparse (rocmPackages_6.rocsparse)
  • rocmPackages.rocthrust (rocmPackages_6.rocthrust)
  • rocmPackages.roctracer (rocmPackages_6.roctracer)
  • rocmPackages.rocwmma (rocmPackages_6.rocwmma)
  • rocmPackages.rpp (rocmPackages.rpp-hip ,rocmPackages_6.rpp ,rocmPackages_6.rpp-hip)
  • rocmPackages.rpp-cpu (rocmPackages_6.rpp-cpu)
  • rocmPackages.rpp-opencl (rocmPackages_6.rpp-opencl)
  • rocmPackages.tensile (rocmPackages_6.tensile)
  • rocmPackages.tensile.dist (rocmPackages_6.tensile.dist)

Here is the build log:

@nix { "action": "setPhase", "phase": "unpackPhase" }
Running phase: unpackPhase
unpacking source archive /nix/store/ihrai11dd7zmmvj6j9g1kyj4337f85hh-source
source root is source
@nix { "action": "setPhase", "phase": "patchPhase" }
Running phase: patchPhase
applying patch /nix/store/04x9sfq554jgzhafdkpyznpk0azglwp0-Extend-rocBLAS-HIP-ISA-compatibility.patch
patching file library/src/handle.cpp
patching file library/src/rocblas_auxiliary.cpp
patching file library/src/tensile_host.cpp
ln: failed to create symbolic link 'build/Tensile/library/TensileLibrary.dat': File exists
/nix/store/hylcibqj87y8flsj51d2dpc7fiy6g1gx-stdenv-linux/setup: line 131: pop_var_context: head of shell_variables not a function context

Did you manage to run nixpkgs-review or are you having trouble with it?
My own experience is that ROCm is quite a pain to build with its huge hardware requirements.

@GZGavinZhao
Copy link
Contributor Author

GZGavinZhao commented Mar 28, 2024

@mschwaig Thank you for the run! I haven't run nixpkgs-review yet because I'm still fixing rocblas. I realized I could set gpuTargets to only my GPU and I had been doing that to reduce the build time/memory requirement.

I had a fix for rocBLAS but it was really messy. I think I figured out a clean solution that requires just one short patch and no more postPatch. Currently building locally to verify.

@GZGavinZhao
Copy link
Contributor Author

Quick question: I want to test rocBLAS through llama.cpp. How do I build a llama-cpp that has rocmSupport set to true?

@mschwaig
Copy link
Member

mschwaig commented Mar 28, 2024

Quick question: I want to test rocBLAS through llama.cpp. How do I build a llama-cpp that has rocmSupport set to true?

Should be llama-cpp.override { rocmSupport = true; }.

That works as part of some shell or system config. I don't know by without looking it up myself how to do it directly on the command line.
If you do not want touch some file to do this, you can also use the nix repl:

~/nixpkgs (fix-torch-with-rocm) [1]> nix repl
Welcome to Nix 2.18.1. Type :? for help.

nix-repl> :lf .
nix-repl> legacyPackages.x86_64-linux.llama-cpp.override { rocmSupport = true; }
«derivation /nix/store/p3743v86jviafpvgi53nwrs2bc3gzppn-llama-cpp-2481.drv»

nix-repl> :b legacyPackages.x86_64-linux.llama-cpp.override { rocmSupport = true; }

@GZGavinZhao
Copy link
Contributor Author

Hi @mschwaig would you mind running nixpkgs-review on your end? rocblas now compiles fine, but for some reason even 64GB of RAM is not enough to run nixpkgs-review and I still frequently get out-of-disk-space errors :(

@mschwaig
Copy link
Member

Hi @mschwaig would you mind running nixpkgs-review on your end? rocblas now compiles fine, but for some reason even 64GB of RAM is not enough to run nixpkgs-review and I still frequently get out-of-disk-space errors :(

Great to hear that it compiles.

I'm running nixpkgs-review now and I will post the results when it completes.

@Tungsten842
Copy link
Member

I was reading the llvm documentation for the upcoming generic targets and it doesn't look like that this patch will always work, for example:
image
It looks like that some targets do not support some instructions, so they are not really compatible. So for example for rocblas with this patch if you are using a gfx1012 by default you will be using gfx1010 code, that includes some instructions that are not supported by gfx1012. It looks like this might cause some problems.

@mschwaig
Copy link
Member

Result of nixpkgs-review pr 298388 run on x86_64-linux 1

14 packages marked as broken and skipped:
  • rocmPackages.llvm.flang
  • rocmPackages.llvm.flang.doc
  • rocmPackages.llvm.flang.info
  • rocmPackages.llvm.flang.man
  • rocmPackages.migraphx
  • rocmPackages.rdc
  • rocmPackages.rdc.doc
  • rocmPackages_6.llvm.flang
  • rocmPackages_6.llvm.flang.doc
  • rocmPackages_6.llvm.flang.info
  • rocmPackages_6.llvm.flang.man
  • rocmPackages_6.migraphx
  • rocmPackages_6.rdc
  • rocmPackages_6.rdc.doc
5 packages failed to build:
  • rocmPackages.hipblas (rocmPackages_6.hipblas)
  • rocmPackages.hipsolver (rocmPackages_6.hipsolver)
  • rocmPackages.miopen (rocmPackages.miopen-hip ,rocmPackages_6.miopen ,rocmPackages_6.miopen-hip)
  • rocmPackages.rocalution (rocmPackages_6.rocalution)
  • rocmPackages.rocsolver (rocmPackages_6.rocsolver)
35 packages built:
  • blender-hip
  • rocmPackages.clr (rocmPackages_6.clr)
  • rocmPackages.clr.icd (rocmPackages_6.clr.icd)
  • rocmPackages.composable_kernel (rocmPackages_6.composable_kernel)
  • rocmPackages.hipcub (rocmPackages_6.hipcub)
  • rocmPackages.hipfft (rocmPackages_6.hipfft)
  • rocmPackages.hiprand (rocmPackages.rocrand ,rocmPackages_6.hiprand ,rocmPackages_6.rocrand)
  • rocmPackages.hipsparse (rocmPackages_6.hipsparse)
  • rocmPackages.llvm.mlir (rocmPackages_6.llvm.mlir)
  • rocmPackages.llvm.openmp (rocmPackages_6.llvm.openmp)
  • rocmPackages.llvm.openmp.doc (rocmPackages_6.llvm.openmp.doc)
  • rocmPackages.llvm.openmp.info (rocmPackages_6.llvm.openmp.info)
  • rocmPackages.llvm.openmp.man (rocmPackages_6.llvm.openmp.man)
  • rocmPackages.rccl (rocmPackages_6.rccl)
  • rocmPackages.rocblas (rocmPackages_6.rocblas)
  • rocmPackages.rocdbgapi (rocmPackages_6.rocdbgapi)
  • rocmPackages.rocdbgapi.doc (rocmPackages_6.rocdbgapi.doc)
  • rocmPackages.rocfft (rocmPackages_6.rocfft)
  • rocmPackages.rocm-runtime (rocmPackages_6.rocm-runtime)
  • rocmPackages.rocminfo (rocmPackages_6.rocminfo)
  • rocmPackages.rocmlir (rocmPackages_6.rocmlir)
  • rocmPackages.rocmlir-rock (rocmPackages_6.rocmlir-rock)
  • rocmPackages.rocmlir.external (rocmPackages_6.rocmlir.external)
  • rocmPackages.rocprim (rocmPackages_6.rocprim)
  • rocmPackages.rocprofiler (rocmPackages_6.rocprofiler)
  • rocmPackages.rocr-debug-agent (rocmPackages_6.rocr-debug-agent)
  • rocmPackages.rocsparse (rocmPackages_6.rocsparse)
  • rocmPackages.rocthrust (rocmPackages_6.rocthrust)
  • rocmPackages.roctracer (rocmPackages_6.roctracer)
  • rocmPackages.rocwmma (rocmPackages_6.rocwmma)
  • rocmPackages.rpp (rocmPackages.rpp-hip ,rocmPackages_6.rpp ,rocmPackages_6.rpp-hip)
  • rocmPackages.rpp-cpu (rocmPackages_6.rpp-cpu)
  • rocmPackages.rpp-opencl (rocmPackages_6.rpp-opencl)
  • rocmPackages.tensile (rocmPackages_6.tensile)
  • rocmPackages.tensile.dist (rocmPackages_6.tensile.dist)

These are the logs of the failing packages:
rocmPackages.rocsolver.log
rocmPackages.rocalution.log
rocmPackages.miopen.log

The logs for hipblas and hipsolver were empty since their dependencies failed to build.

@GZGavinZhao
Copy link
Contributor Author

GZGavinZhao commented Mar 29, 2024

Hi @Tungsten842, I believe it's actually the reverse. The document is saying that because we're compiling against gfx1010, when you're running an executable compiled against gfx1010 on gfx1012 hardware, the instruction v_dot4_i32_i8 would not be available for you to use (the instruction would work properly if you try to call it, but your executable would never use this instruction) since the compiler has to assume you may be running on gfx1010. For example, you can see from the gfx1010 instruction docs that v_dot4_i32_i8 is not present, but this instruction is present in the gfx1011/gfx1012 instruction docs.

Therefore, gfx1010 is indeed the greatest common denominator for RDNA1 GPUs and all RDNA1 GPUs can use gfx1010 instructions as the basis ;)

@GZGavinZhao
Copy link
Contributor Author

@mschwaig I believe I have fixed the issue (the header file rocblas.h is now being installed properly). Would you mind running nixpkgs-review when you get a chance? Thanks!

@Tungsten842
Copy link
Member

Hi @Tungsten842, I believe it's actually the reverse. The document is saying that because we're compiling against gfx1010, when you're running an executable compiled against gfx1010 on gfx1012 hardware, the instruction v_dot4_i32_i8 would not be available for you to use (the instruction would work properly if you try to call it, but your executable would never use this instruction) since the compiler has to assume you may be running on gfx1010. For example, you can see from the gfx1010 instruction docs that v_dot4_i32_i8 is not present, but this instruction is present in the gfx1011/gfx1012 instruction docs.

Therefore, gfx1010 is indeed the greatest common denominator for RDNA1 GPUs and all RDNA1 GPUs can use gfx1010 instructions as the basis ;)

Makes sense. But there is still one problem, it looks like that gfx900 supports some v_mad_mix instructions. But (gfx906, gfx904...) do not.
https://llvm.org/docs/AMDGPU/AMDGPUAsmGFX900.html
https://llvm.org/docs/AMDGPU/AMDGPUAsmGFX904.html
https://llvm.org/docs/AMDGPU/AMDGPUAsmGFX906.html
Also gfx1100 seems to have more VGPRs than other targets, not sure about the impact of this: https://reviews.llvm.org/D134522.

@GZGavinZhao
Copy link
Contributor Author

@Tungsten842 You're right, though this PR doesn't make those changes, only the following:

  • gfx902, gfx909, gfx90c -> gfx900
  • gfx101* -> gfx1010
  • gfx103* -> gfx1030

So I believe we should be in the safe zone.

@GZGavinZhao
Copy link
Contributor Author

@mschwaig Updated according to 89ab15f

@ofborg ofborg bot requested a review from mschwaig April 25, 2024 02:54
@cgmb
Copy link

cgmb commented Apr 25, 2024

@mschwaig Updated according to 89ab15f

You've already patched rocBLAS to run gfx1010 kernels on gfx1012 hardware, so you can further reduce the size by dropping gfx1012 from that list.

@mschwaig
Copy link
Member

mschwaig commented Apr 25, 2024

@mschwaig Updated according to 89ab15f

Great! I already ran nixpkgs-review again with that change.

Result of nixpkgs-review run on x86_64-linux 1

10 packages marked as broken and skipped:
  • rocmPackages.llvm.flang
  • rocmPackages.llvm.flang.doc
  • rocmPackages.llvm.flang.info
  • rocmPackages.llvm.flang.man
  • rocmPackages.migraphx
  • rocmPackages.mivisionx
  • rocmPackages.mivisionx-cpu
  • rocmPackages.mivisionx-hip
  • rocmPackages.rdc
  • rocmPackages.rdc.doc
42 packages built:
  • blender-hip
  • rocmPackages.clr
  • rocmPackages.clr.icd
  • rocmPackages.composable_kernel
  • rocmPackages.hipblas
  • rocmPackages.hipcub
  • rocmPackages.hipfft
  • rocmPackages.hiprand
  • rocmPackages.hipsolver
  • rocmPackages.hipsparse
  • rocmPackages.llvm.mlir
  • rocmPackages.llvm.openmp
  • rocmPackages.llvm.openmp.doc
  • rocmPackages.llvm.openmp.info
  • rocmPackages.llvm.openmp.man
  • rocmPackages.miopen
  • rocmPackages.rccl
  • rocmPackages.rocalution
  • rocmPackages.rocblas
  • rocmPackages.rocdbgapi
  • rocmPackages.rocdbgapi.doc
  • rocmPackages.rocfft
  • rocmPackages.rocm-runtime
  • rocmPackages.rocminfo
  • rocmPackages.rocmlir
  • rocmPackages.rocmlir-rock
  • rocmPackages.rocmlir.external
  • rocmPackages.rocprim
  • rocmPackages.rocprofiler
  • rocmPackages.rocr-debug-agent
  • rocmPackages.rocrand
  • rocmPackages.rocsolver
  • rocmPackages.rocsparse
  • rocmPackages.rocthrust
  • rocmPackages.roctracer
  • rocmPackages.rocwmma
  • rocmPackages.rpp (rocmPackages.rpp-hip)
  • rocmPackages.rpp-cpu
  • rocmPackages.rpp-opencl
  • rocmPackages.tensile
  • rocmPackages.tensile.dist
  • zluda

The sizes are looking good as well:

du -h --apparent-size --max-depth 0 */ | grep G
3.7G	rocmPackages.composable_kernel/
1.6G	rocmPackages.miopen/
2.7G	rocmPackages.rocblas/
2.1G	rocmPackages.rocfft/
1.5G	rocmPackages.rocsolver/
1.2G	rocmPackages.rocsparse/

EDIT: llama-cpp also still runs on my RX 6600

@GZGavinZhao
Copy link
Contributor Author

You've already patched rocBLAS to run gfx1010 kernels on gfx1012 hardware, so you can further reduce the size by dropping gfx1012 from that list.

@cgmb Good call, thanks for pointing that out! I added gfx1012 to enable some optimization/instructions unavailable in gfx1010, but forgot my patch causes gfx101* GPUs to run as gfx1010, so building for gfx1012 is indeed useless for now 😅

@mschwaig I've removed gfx1012 just to save a bit more space. I don't think this requires another nixpkgs-review, so I think this is good for merge.

Copy link
Member

@mschwaig mschwaig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used git range-diff master 062f308b9601f3fec588c8df0225eff7731f9834 32b9f37d57826364db0bc579ee75fa82af1fd623 to verify the contents of that last force-push and it looks good.

I would prefer if we had more info about what targets are missing in that list and why.
Maybe we should have a comment above that gpuTarget simply stating that
these are the defaults with gfx940 and gfx941 removed because they are early engineering samples and gfx1012 removed, because of the patch that makes it unused.

Otherwise it will be difficult to get this list right on future ROCm updates.

@GZGavinZhao
Copy link
Contributor Author

@mschwaig Comments added. Ideally I think we should be using clr.gpuTargets in all places, though that would be for another PR. Solus uses gfx803;gfx900;gfx906;gfx908;gfx90a;gfx1010;gfx1030;gfx1100;gfx1101;gfx1102, and the only package we had to patch was composable_kernel.

@ofborg ofborg bot requested a review from mschwaig April 25, 2024 16:12
@ulrikstrid
Copy link
Member

I will take a closer look at this tomorrow and see if we can land this. Thanks a lot for working on this!

@ulrikstrid
Copy link
Member

The changes looks good. I'm just running a last nixpkgs-review and then we can merge this.

Again thank you @GZGavinZhao and @mschwaig for making this happen!

@wegank wegank added 12.approvals: 2 This PR was reviewed and approved by two reputable people 12.approved-by: package-maintainer This PR was reviewed and approved by a maintainer listed in the package labels Apr 27, 2024
@ulrikstrid ulrikstrid merged commit cd711ad into NixOS:master Apr 29, 2024
27 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: rocm 10.rebuild-darwin: 1-10 10.rebuild-linux: 11-100 12.approvals: 2 This PR was reviewed and approved by two reputable people 12.approved-by: package-maintainer This PR was reviewed and approved by a maintainer listed in the package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants