Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: bulkinsert f16 and bf16 data in csv will cause accuracy issue #36632

Closed
1 task done
OxalisCu opened this issue Sep 30, 2024 · 4 comments
Closed
1 task done

[Bug]: bulkinsert f16 and bf16 data in csv will cause accuracy issue #36632

OxalisCu opened this issue Sep 30, 2024 · 4 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@OxalisCu
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Using json.Unmarshal([]byte(obj), &vec) to parse an array in the code will result in loss of float:

obj: [0.89990234375, 0.2763671875, 0.172119140625, 0.453125, 0.338134765625, 0.187255859375, 0.4765625, 0.78662109375]
vec: [0.89990234, 0.2763672, 0.17211914, 0.453125, 0.33813477, 0.18725586, 0.4765625, 0.7866211]

Expected Behavior

obj: [0.89990234375, 0.2763671875, 0.172119140625, 0.453125, 0.338134765625, 0.187255859375, 0.4765625, 0.78662109375]
vec: [0.89990234375, 0.2763671875, 0.172119140625, 0.453125, 0.338134765625, 0.187255859375, 0.4765625, 0.78662109375]

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

@OxalisCu OxalisCu added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 30, 2024
@OxalisCu
Copy link
Contributor Author

/assign OxalisCu

@xiaofan-luan
Copy link
Collaborator

0.89990234375

Correct. For FP16 (half-precision), you get less accuracy compared to FP32. Specifically:

FP16 (half precision) has 5 decimal digits of precision (approximately).
FP32 (single precision) has about 7 decimal digits of precision.

@xiaofan-luan
Copy link
Collaborator

So I think there is no way you can maintain that much detail with fp16. Most likely your data is under fp64 or double and Milvus don't support it for now. Because for vectors there is usually not a necessity to keep it full precision

@yanliang567 yanliang567 added help wanted Extra attention is needed and removed kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 8, 2024
@yanliang567 yanliang567 removed their assignment Oct 8, 2024
@OxalisCu
Copy link
Contributor Author

OxalisCu commented Oct 16, 2024

I understand that json.Unmarshal parses floating point numbers according to f32 by default, which will cause precision loss for f64, but it has no effect on f16 type.

The data 0.89990234375 is generated in pymilvus, because before json.dumps, np.array(dtype=np.float16).tolist() converts nparray into list, and python native data types do not support float16, so they are stored as float64, but in fact we only need float16 precision.

In short, my submission has no practical significance, and I am ready to close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants