[Codegen][CUDA] Fix make_int4x cuda codegen vectorize #8137

wyc-ruiker · 2021-05-26T03:20:49Z

Added support for int4x32 int4x16 int4x4 in BroadcastNode.

In the int4x4 testcase, the IR is:

primfn(compute_1: handle) -> ()
  attr = {"global_symbol": "main", "tir.noalias": True}
  buffers = {compute: Buffer(compute_2: Pointer(int4), int4, [64, 4], [])}
  buffer_map = {compute_1: compute} {
  attr [IterVar(blockIdx.x: int32, (nullptr), "ThreadIndex", "blockIdx.x")] "thread_extent" = 64;
  compute_2[ramp((blockIdx.x*4), 1, 4)] = broadcast(1i4, 4)
}

Before the fix in codegen_c.cc, the codegen cuda is:

extern "C" __global__ void make_int4x4_kernel0(int* __restrict__ compute) {
  ((int16_t*)(compute + ((((int)blockIdx.x) * 4)) / 8))[0] = (int16_t)4369;
}

For int16_t, this index (((int)blockIdx.x) * 4)) / 8 is a bug.
After the fix in codegen_c.cc, the codegen cuda is:

extern "C" __global__ void make_int4x4_kernel0(int* __restrict__ compute) {
  ((int16_t*)(compute) + ((((int)blockIdx.x) * 4)) / 4)[0] = (int16_t)4369;
}

Could you please help review this fix? @vinx13 @Hzfengsy

tqchen · 2021-05-26T18:14:22Z

@vinx13 please help to manage this PR

Co-authored-by: wangyucheng <[email protected]>

add fix make_int4

308bdb5

tqchen assigned vinx13 May 26, 2021

tqchen added the status: need review label May 26, 2021

vinx13 approved these changes May 26, 2021

View reviewed changes

vinx13 merged commit f4dce24 into apache:main May 26, 2021

vinx13 added status: accepted and removed status: need review labels May 26, 2021

wyc-ruiker deleted the fix-int4 branch May 27, 2021 02:27

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Jun 17, 2021

[Codegen][CUDA] Fix make_int4x cuda codegen vectorize (apache#8137)

8117f3f

Co-authored-by: wangyucheng <[email protected]>

trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Jun 17, 2021

[Codegen][CUDA] Fix make_int4x cuda codegen vectorize (apache#8137)

9ff0639

Co-authored-by: wangyucheng <[email protected]>

junrushao mentioned this pull request Nov 1, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Codegen][CUDA] Fix make_int4x cuda codegen vectorize #8137

[Codegen][CUDA] Fix make_int4x cuda codegen vectorize #8137

wyc-ruiker commented May 26, 2021

tqchen commented May 26, 2021

[Codegen][CUDA] Fix make_int4x cuda codegen vectorize #8137

[Codegen][CUDA] Fix make_int4x cuda codegen vectorize #8137

Conversation

wyc-ruiker commented May 26, 2021

tqchen commented May 26, 2021