[Bug] Inconsistency caused by 65535f16*0f16 after using compute_inline #12377

cxx122 · 2022-08-11T08:13:00Z

TENSOR_0 = te.compute([14], lambda rck:te.max_value("float16")*te.min_value("uint16"), name ="TENSOR_1")
TENSOR_1 = te.compute([11], lambda oco:te.max_value("uint16")*TENSOR_0[oco], name ="TENSOR_2")

The tir program before compute_inline:

@main = primfn(TENSOR_1_1: handle, TENSOR_2_1: handle) -> ()
  attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
  buffers = {TENSOR_1: Buffer(TENSOR_1_2: Pointer(float16), float16, [14], []),
             TENSOR_2: Buffer(TENSOR_2_2: Pointer(float16), float16, [11], [])}
  buffer_map = {TENSOR_1_1: TENSOR_1, TENSOR_2_1: TENSOR_2}
  preflattened_buffer_map = {TENSOR_1_1: TENSOR_1_3: Buffer(TENSOR_1_2, float16, [14], []), TENSOR_2_1: TENSOR_2_3: Buffer(TENSOR_2_2, float16, [11], [])} {
  for (rck: int32, 0, 11) {
    TENSOR_1[rck] = 0f16
  }
  for (oco: int32, 0, 11) {
    TENSOR_2[oco] = (65535f16*TENSOR_1[oco])
  }
}

The tir program after compute_inline:

@main = primfn(TENSOR_1_1: handle, TENSOR_2_1: handle) -> ()
  attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
  buffers = {TENSOR_1: Buffer(TENSOR_1_2: Pointer(float16), float16, [14], []),
             TENSOR_2: Buffer(TENSOR_2_2: Pointer(float16), float16, [11], [])}
  buffer_map = {TENSOR_1_1: TENSOR_1, TENSOR_2_1: TENSOR_2}
  preflattened_buffer_map = {TENSOR_1_1: TENSOR_1_3: Buffer(TENSOR_1_2, float16, [14], []), TENSOR_2_1: TENSOR_2_3: Buffer(TENSOR_2_2, float16, [11], [])} {
  for (oco: int32, 0, 11) {
    TENSOR_2[oco] = 0f16
  }
}

Actual behavior

AssertionError: 
Not equal to tolerance rtol=1e-05, atol=1e-07

x and y nan location mismatch:
 x: array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
      dtype=float16)
 y: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float16)

Environment

Operating System: Ubuntu 18.04, TVM version: tag0.9.0 [d361585]

Steps to reproduce

import os
import numpy as np
import tvm
from tvm import te, auto_scheduler, topi
import tvm.testing

TENSOR_0 = te.compute([14], lambda rck:te.max_value("float16")*te.min_value("uint16"), name ="TENSOR_1")
TENSOR_1 = te.compute([11], lambda oco:te.max_value("uint16")*TENSOR_0[oco], name ="TENSOR_2")
s = te.create_schedule(TENSOR_1.op)
tensor_list = [TENSOR_0,TENSOR_1]

dev = tvm.cpu(0)
pre_list = []
after_list = []
for tensor in tensor_list:
    shape = [x.value if 'value' in dir(x) and isinstance(x.value, int) else 1 for x in tensor.shape]
    params = (5*np.random.uniform(size=shape)).astype(tensor.dtype)
    pre_list.append(tvm.nd.array(params.copy(), dev))
    after_list.append(tvm.nd.array(params.copy(), dev))

pre_mod = tvm.lower(s, tensor_list, simple_mode=True)
with tvm.transform.PassContext(opt_level=4):
    f = tvm.build(pre_mod)
f(*pre_list)

s[TENSOR_0].compute_inline()

now_mod = tvm.lower(s, tensor_list, simple_mode=True)
with tvm.transform.PassContext(opt_level=4):
    f = tvm.build(now_mod)
f(*after_list)

tvm.testing.assert_allclose(pre_list[1].numpy(), after_list[1].numpy(),rtol=1e-5)

The text was updated successfully, but these errors were encountered:

ganler · 2022-08-17T22:57:56Z

@cxx122 Thanks for the report. It seems you are trying to compute "65535f16 * 0f16" which returns "nan" as an undefined behavior.

Since its output is "nan" and according to IEEE 754 that "nan" is not comparable, I don't think it is suitable to regard this as an inconsistency bug since the computation itself is ill-formed and undefined. From a fuzzing prespective, IMO, those should be regarded as false alarms that the algorithm should try to avoid sythesizing programs with undefined behaviors (like CSmith).

ganler · 2022-08-17T23:06:36Z

Similarly in many rest bugs reports, as opt_level=4 is specified which indicates fast math optimization, it is highly possible to have those numerical "inconsistency" when the computation is not well-formed.

cxx122 · 2022-08-18T02:24:56Z

Thanks, when I submitted this bug I also considered that it might be because of this problem. This may not be a bug in the strict sense.

wrongtest-intellif · 2022-08-20T10:23:02Z

Actually there is no "65535f16", should be nan because it exceeds maximum of fp16, it seems to be an issue in literal construction and constant folding.

ganler · 2022-08-20T15:58:43Z

@wrongtest-intellif Good point. "65535f16" is actually inf. But inf * 0 gets us a nan. :-)

cxx122 added the type: bug label Aug 11, 2022

cxx122 changed the title ~~[Bug] Inconsistent caused by 65535f16*0f16 after using compute_inline~~ [Bug] Inconsistency caused by 65535f16*0f16 after using compute_inline Aug 11, 2022

ganler mentioned this issue Aug 17, 2022

[Bug] Inconsistency caused by 127i8*127i8*numi64 after using compute_inline #12378

Closed

wrongtest-intellif self-assigned this Aug 20, 2022

wrongtest-intellif mentioned this issue Aug 20, 2022

[TIR][Arith] Add more strict checking in imm construction and folding. #12515

Merged

cxx122 closed this as completed Sep 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Inconsistency caused by 65535f16*0f16 after using compute_inline #12377

[Bug] Inconsistency caused by 65535f16*0f16 after using compute_inline #12377

cxx122 commented Aug 11, 2022 •

edited

Loading

ganler commented Aug 17, 2022 •

edited

Loading

ganler commented Aug 17, 2022

cxx122 commented Aug 18, 2022

wrongtest-intellif commented Aug 20, 2022

ganler commented Aug 20, 2022

[Bug] Inconsistency caused by 65535f16*0f16 after using compute_inline #12377

[Bug] Inconsistency caused by 65535f16*0f16 after using compute_inline #12377

Comments

cxx122 commented Aug 11, 2022 • edited Loading

Actual behavior

Environment

Steps to reproduce

ganler commented Aug 17, 2022 • edited Loading

ganler commented Aug 17, 2022

cxx122 commented Aug 18, 2022

wrongtest-intellif commented Aug 20, 2022

ganler commented Aug 20, 2022

cxx122 commented Aug 11, 2022 •

edited

Loading

ganler commented Aug 17, 2022 •

edited

Loading