[Bug]: bulkinsert f16 and bf16 data in csv will cause accuracy issue #36632

OxalisCu · 2024-09-30T12:11:27Z

Is there an existing issue for this?

I have searched the existing issues

Environment

- Milvus version:
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Using json.Unmarshal([]byte(obj), &vec) to parse an array in the code will result in loss of float:

obj: [0.89990234375, 0.2763671875, 0.172119140625, 0.453125, 0.338134765625, 0.187255859375, 0.4765625, 0.78662109375]
vec: [0.89990234, 0.2763672, 0.17211914, 0.453125, 0.33813477, 0.18725586, 0.4765625, 0.7866211]

Expected Behavior

obj: [0.89990234375, 0.2763671875, 0.172119140625, 0.453125, 0.338134765625, 0.187255859375, 0.4765625, 0.78662109375]
vec: [0.89990234375, 0.2763671875, 0.172119140625, 0.453125, 0.338134765625, 0.187255859375, 0.4765625, 0.78662109375]

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

The text was updated successfully, but these errors were encountered:

OxalisCu · 2024-09-30T12:11:55Z

/assign OxalisCu

xiaofan-luan · 2024-10-01T04:30:01Z

0.89990234375

Correct. For FP16 (half-precision), you get less accuracy compared to FP32. Specifically:

FP16 (half precision) has 5 decimal digits of precision (approximately).
FP32 (single precision) has about 7 decimal digits of precision.

xiaofan-luan · 2024-10-01T04:33:27Z

So I think there is no way you can maintain that much detail with fp16. Most likely your data is under fp64 or double and Milvus don't support it for now. Because for vectors there is usually not a necessity to keep it full precision

OxalisCu · 2024-10-16T15:11:30Z

I understand that json.Unmarshal parses floating point numbers according to f32 by default, which will cause precision loss for f64, but it has no effect on f16 type.

The data 0.89990234375 is generated in pymilvus, because before json.dumps, np.array(dtype=np.float16).tolist() converts nparray into list, and python native data types do not support float16, so they are stored as float64, but in fact we only need float16 precision.

In short, my submission has no practical significance, and I am ready to close this PR.

OxalisCu added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 30, 2024

OxalisCu assigned yanliang567 Sep 30, 2024

sre-ci-robot assigned OxalisCu Sep 30, 2024

OxalisCu mentioned this issue Sep 30, 2024

fix: float16 parsing accuracy issue in csv #36633

Closed

yanliang567 added help wanted Extra attention is needed and removed kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 8, 2024

yanliang567 removed their assignment Oct 8, 2024

OxalisCu closed this as completed Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: bulkinsert f16 and bf16 data in csv will cause accuracy issue #36632

[Bug]: bulkinsert f16 and bf16 data in csv will cause accuracy issue #36632

OxalisCu commented Sep 30, 2024

OxalisCu commented Sep 30, 2024

xiaofan-luan commented Oct 1, 2024

xiaofan-luan commented Oct 1, 2024

OxalisCu commented Oct 16, 2024 •

edited

Loading

[Bug]: bulkinsert f16 and bf16 data in csv will cause accuracy issue #36632

[Bug]: bulkinsert f16 and bf16 data in csv will cause accuracy issue #36632

Comments

OxalisCu commented Sep 30, 2024

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

OxalisCu commented Sep 30, 2024

xiaofan-luan commented Oct 1, 2024

xiaofan-luan commented Oct 1, 2024

OxalisCu commented Oct 16, 2024 • edited Loading

OxalisCu commented Oct 16, 2024 •

edited

Loading