-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Two small fixes to AMDCPU codegen for LLVM 10+ and ROCm 3.5+ #5920
Conversation
t-vi
commented
Jun 24, 2020
•
edited
Loading
edited
- LLVM 10+ got stricter and fails hard when Aligning to 0 bytes, so we don't.
- ROCm 3.5 apparently deprecates code object v2. So if we have the new ROCm, we use the (default) v3.
3206394
to
4f6e499
Compare
@masahi This is needed for the latest ROCm and also recent LLVM. I'm not particularly fond of needing to go through the device API, but I haven't found a better way, as the codegen wants to be compilable without having HIP installed. |
- For LLVM 10+ we need to avoid calling Align with 0, or else we get a crash. - For ROCm 3.5+ we need to use code object 3 (the default in LLVM 9+) but for ROCm < 3.5 we want the code object 2. - As we want to separate codegen from the API, we need to add a device api query for the version. But every one else wants now one, too. (But I only filled it in for CUDA for now.) - I'm throwing in an addition of kMaxRegistersPerBlock for ROCm. This was introduced for CUDA in apache#5898.
Sorry, another small thing:
|
NOTE: using runtime detection of rocm features will only work if we are building on the same machine and won't work for cross compilation. While it is OK for now, let us keep that in mind and once we land https://discuss.tvm.ai/t/rfc-tvm-target-specification/6844, we might want to allow user to explicitly specify the attr and only use auto detect if the attr is not specified(or march=native is used) |
@tqchen Yeah, so the background to this is that the recent release of ROCm 3.5 brings rather sweeping changes (changing the compiler backend for the HIP compilation among other things). My conclusion from this would be
|
…5920) - For LLVM 10+ we need to avoid calling Align with 0, or else we get a crash. - For ROCm 3.5+ we need to use code object 3 (the default in LLVM 9+) but for ROCm < 3.5 we want the code object 2. - As we want to separate codegen from the API, we need to add a device api query for the version. But every one else wants now one, too. (But I only filled it in for CUDA for now.) - I'm throwing in an addition of kMaxRegistersPerBlock for ROCm. This was introduced for CUDA in apache#5898.
…5920) - For LLVM 10+ we need to avoid calling Align with 0, or else we get a crash. - For ROCm 3.5+ we need to use code object 3 (the default in LLVM 9+) but for ROCm < 3.5 we want the code object 2. - As we want to separate codegen from the API, we need to add a device api query for the version. But every one else wants now one, too. (But I only filled it in for CUDA for now.) - I'm throwing in an addition of kMaxRegistersPerBlock for ROCm. This was introduced for CUDA in apache#5898.