Two small fixes to AMDCPU codegen for LLVM 10+ and ROCm 3.5+ #5920

t-vi · 2020-06-24T19:42:38Z

LLVM 10+ got stricter and fails hard when Aligning to 0 bytes, so we don't.
ROCm 3.5 apparently deprecates code object v2. So if we have the new ROCm, we use the (default) v3.

t-vi · 2020-06-25T12:08:17Z

@masahi This is needed for the latest ROCm and also recent LLVM.
Without it, the execution will fail with "symbol not found error", which probably is hard to decypher.

I'm not particularly fond of needing to go through the device API, but I haven't found a better way, as the codegen wants to be compilable without having HIP installed.

- For LLVM 10+ we need to avoid calling Align with 0, or else we get a crash. - For ROCm 3.5+ we need to use code object 3 (the default in LLVM 9+) but for ROCm < 3.5 we want the code object 2. - As we want to separate codegen from the API, we need to add a device api query for the version. But every one else wants now one, too. (But I only filled it in for CUDA for now.) - I'm throwing in an addition of kMaxRegistersPerBlock for ROCm. This was introduced for CUDA in apache#5898.

t-vi · 2020-06-25T13:38:44Z

Sorry, another small thing:

I'm throwing in an addition of kMaxRegistersPerBlock for ROCm. This was introduced for CUDA in CUDA device API & VerifyGPUCode pass update #5898.

tqchen · 2020-06-25T15:21:36Z

NOTE: using runtime detection of rocm features will only work if we are building on the same machine and won't work for cross compilation. While it is OK for now, let us keep that in mind and once we land https://discuss.tvm.ai/t/rfc-tvm-target-specification/6844, we might want to allow user to explicitly specify the attr and only use auto detect if the attr is not specified(or march=native is used)

t-vi · 2020-06-25T16:05:27Z

@tqchen Yeah, so the background to this is that the recent release of ROCm 3.5 brings rather sweeping changes (changing the compiler backend for the HIP compilation among other things).

My conclusion from this would be

I'd have some sympathy for people not switching immediately (I was one of them 2 days ago),
In half a year, serious users will have switched. Then we can then just bump the requirement to ROCm >= 3.5 then (IMHO).

…5920) - For LLVM 10+ we need to avoid calling Align with 0, or else we get a crash. - For ROCm 3.5+ we need to use code object 3 (the default in LLVM 9+) but for ROCm < 3.5 we want the code object 2. - As we want to separate codegen from the API, we need to add a device api query for the version. But every one else wants now one, too. (But I only filled it in for CUDA for now.) - I'm throwing in an addition of kMaxRegistersPerBlock for ROCm. This was introduced for CUDA in apache#5898.

t-vi force-pushed the fix_rocm_305 branch 6 times, most recently from 3206394 to 4f6e499 Compare June 25, 2020 07:10

t-vi force-pushed the fix_rocm_305 branch from 4f6e499 to 969d7ac Compare June 25, 2020 13:37

tqchen approved these changes Jun 25, 2020

View reviewed changes

tqchen merged commit 6d59ed4 into apache:master Jun 25, 2020

ZihengJiang mentioned this pull request Sep 25, 2020

TVM v0.7 Release Note Candidate #6486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Two small fixes to AMDCPU codegen for LLVM 10+ and ROCm 3.5+ #5920

Two small fixes to AMDCPU codegen for LLVM 10+ and ROCm 3.5+ #5920

t-vi commented Jun 24, 2020 •

edited

Loading

t-vi commented Jun 25, 2020

t-vi commented Jun 25, 2020

tqchen commented Jun 25, 2020

t-vi commented Jun 25, 2020

Two small fixes to AMDCPU codegen for LLVM 10+ and ROCm 3.5+ #5920

Two small fixes to AMDCPU codegen for LLVM 10+ and ROCm 3.5+ #5920

Conversation

t-vi commented Jun 24, 2020 • edited Loading

t-vi commented Jun 25, 2020

t-vi commented Jun 25, 2020

tqchen commented Jun 25, 2020

t-vi commented Jun 25, 2020

t-vi commented Jun 24, 2020 •

edited

Loading