Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two small fixes to AMDCPU codegen for LLVM 10+ and ROCm 3.5+ #5920

Merged
merged 1 commit into from
Jun 25, 2020

Conversation

t-vi
Copy link
Contributor

@t-vi t-vi commented Jun 24, 2020

  • LLVM 10+ got stricter and fails hard when Aligning to 0 bytes, so we don't.
  • ROCm 3.5 apparently deprecates code object v2. So if we have the new ROCm, we use the (default) v3.

@t-vi t-vi force-pushed the fix_rocm_305 branch 6 times, most recently from 3206394 to 4f6e499 Compare June 25, 2020 07:10
@t-vi
Copy link
Contributor Author

t-vi commented Jun 25, 2020

@masahi This is needed for the latest ROCm and also recent LLVM.
Without it, the execution will fail with "symbol not found error", which probably is hard to decypher.

I'm not particularly fond of needing to go through the device API, but I haven't found a better way, as the codegen wants to be compilable without having HIP installed.

- For LLVM 10+ we need to avoid calling Align with 0, or else
  we get a crash.
- For ROCm 3.5+ we need to use code object 3 (the default in LLVM 9+)
  but for ROCm < 3.5 we want the code object 2.
- As we want to separate codegen from the API, we need to add
  a device api query for the version.
  But every one else wants now one, too. (But I only filled it
  in for CUDA for now.)
- I'm throwing in an addition of kMaxRegistersPerBlock for ROCm.
  This was introduced for CUDA in apache#5898.
@t-vi
Copy link
Contributor Author

t-vi commented Jun 25, 2020

Sorry, another small thing:

@tqchen
Copy link
Member

tqchen commented Jun 25, 2020

NOTE: using runtime detection of rocm features will only work if we are building on the same machine and won't work for cross compilation. While it is OK for now, let us keep that in mind and once we land https://discuss.tvm.ai/t/rfc-tvm-target-specification/6844, we might want to allow user to explicitly specify the attr and only use auto detect if the attr is not specified(or march=native is used)

@t-vi
Copy link
Contributor Author

t-vi commented Jun 25, 2020

@tqchen Yeah, so the background to this is that the recent release of ROCm 3.5 brings rather sweeping changes (changing the compiler backend for the HIP compilation among other things).

My conclusion from this would be

  • I'd have some sympathy for people not switching immediately (I was one of them 2 days ago),
  • In half a year, serious users will have switched. Then we can then just bump the requirement to ROCm >= 3.5 then (IMHO).

@tqchen tqchen merged commit 6d59ed4 into apache:master Jun 25, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Jun 30, 2020
…5920)

- For LLVM 10+ we need to avoid calling Align with 0, or else
  we get a crash.
- For ROCm 3.5+ we need to use code object 3 (the default in LLVM 9+)
  but for ROCm < 3.5 we want the code object 2.
- As we want to separate codegen from the API, we need to add
  a device api query for the version.
  But every one else wants now one, too. (But I only filled it
  in for CUDA for now.)
- I'm throwing in an addition of kMaxRegistersPerBlock for ROCm.
  This was introduced for CUDA in apache#5898.
zhiics pushed a commit to neo-ai/tvm that referenced this pull request Jul 2, 2020
…5920)

- For LLVM 10+ we need to avoid calling Align with 0, or else
  we get a crash.
- For ROCm 3.5+ we need to use code object 3 (the default in LLVM 9+)
  but for ROCm < 3.5 we want the code object 2.
- As we want to separate codegen from the API, we need to add
  a device api query for the version.
  But every one else wants now one, too. (But I only filled it
  in for CUDA for now.)
- I'm throwing in an addition of kMaxRegistersPerBlock for ROCm.
  This was introduced for CUDA in apache#5898.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants