Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [Nightly] Hybrid search results are different from expected #36273

Closed
1 task done
NicoYuan1986 opened this issue Sep 14, 2024 · 6 comments
Closed
1 task done
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@NicoYuan1986
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 34e5f99
- Deployment mode(standalone or cluster):cluster & standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Hybrid search results are different from expected.

[pytest : test] ____ TestCollectionHybridSearchValid.test_hybrid_search_min_limit[IP-int64] ____
[pytest : test] [gw4] linux -- Python 3.8.17 /usr/local/bin/python3
[pytest : test] 
[pytest : test] self = <test_search.TestCollectionHybridSearchValid object at 0x7fd4b73f7820>
[pytest : test] primary_field = 'int64', metric_type = 'IP'
[pytest : test] 
[pytest : test]     @pytest.mark.tags(CaseLabel.L2)
[pytest : test]     @pytest.mark.parametrize("primary_field", [ct.default_int64_field_name, ct.default_string_field_name])
[pytest : test]     def test_hybrid_search_min_limit(self, primary_field, metric_type):
[pytest : test]         """
[pytest : test]         target: test hybrid search with minimum limit params
[pytest : test]         method: create connection, collection, insert and search
[pytest : test]         expected: hybrid search successfully with limit(topK)
[pytest : test]         """
[pytest : test]         # 1. initialize collection with data
[pytest : test]         dim = 99
[pytest : test]         multiple_dim_array = [dim + dim, dim - 10]
[pytest : test]         collection_w, _, _, insert_ids, time_stamp = \
[pytest : test]             self.init_collection_general(prefix, True, dim=dim, is_index=False, primary_field=primary_field,
[pytest : test]                                          enable_dynamic_field=False, multiple_dim_array=multiple_dim_array)[0:5]
[pytest : test]         # 2. extract vector field name
[pytest : test]         vector_name_list = cf.extract_vector_field_name_list(collection_w)
[pytest : test]         flat_index = {"index_type": "FLAT", "params": {}, "metric_type": metric_type}
[pytest : test]         for vector_name in vector_name_list:
[pytest : test]             collection_w.create_index(vector_name, flat_index)
[pytest : test]         collection_w.create_index(ct.default_float_vec_field_name, flat_index)
[pytest : test]         collection_w.load()
[pytest : test]         # 3. prepare search params
[pytest : test]         req_list = []
[pytest : test]         id_list = []
[pytest : test]         for i in range(len(vector_name_list)):
[pytest : test]             vectors = [[random.random() for _ in range(multiple_dim_array[i])] for _ in range(1)]
[pytest : test]             search_params = {"metric_type": metric_type, "offset": 0}
[pytest : test]             search_param = {
[pytest : test]                 "data": vectors,
[pytest : test]                 "anns_field": vector_name_list[i],
[pytest : test]                 "param": search_params,
[pytest : test]                 "limit": min_dim,
[pytest : test]                 "expr": default_search_exp}
[pytest : test]             req = AnnSearchRequest(**search_param)
[pytest : test]             req_list.append(req)
[pytest : test]             search_res = collection_w.search(vectors[:1], vector_name_list[i],
[pytest : test]                                              search_params, min_dim,
[pytest : test]                                              default_search_exp,
[pytest : test]                                              check_task=CheckTasks.check_search_results,
[pytest : test]                                              check_items={"nq": 1,
[pytest : test]                                                           "ids": insert_ids,
[pytest : test]                                                           "limit": min_dim})[0]
[pytest : test]             id_list.extend(search_res[0].ids)
[pytest : test]         # 4. hybrid search
[pytest : test]         hybrid_search = collection_w.hybrid_search(req_list, WeightedRanker(0.1, 0.9), default_limit)[0]
[pytest : test]         assert len(hybrid_search) == 1
[pytest : test] >       assert len(hybrid_search[0].ids) == len(list(set(id_list)))
[pytest : test] E       AssertionError: assert 10 == 4
[pytest : test] E        +  where 10 = len([980, 1430, 1240, 1247, 1360, 1516, ...])
[pytest : test] E        +    where [980, 1430, 1240, 1247, 1360, 1516, ...] = ['id: 980, distance: 0.8404409289360046, entity: {}', 'id: 1430, distance: 0.83946293592453, entity: {}', 'id: 1240, distance: 0.8389506936073303, entity: {}', 'id: 1247, distance: 0.8389191627502441, entity: {}', 'id: 1360, distance: 0.8387582898139954, entity: {}', 'id: 1516, distance: 0.8387090563774109, entity: {}', 'id: 1932, distance: 0.8386892676353455, entity: {}', 'id: 184, distance: 0.8385480642318726, entity: {}', 'id: 1825, distance: 0.8385159969329834, entity: {}', 'id: 653, distance: 0.8384677171707153, entity: {}'].ids
[pytest : test] E        +  and   4 = len([1430, 427, 980, 1422])
[pytest : test] E        +    where [1430, 427, 980, 1422] = list({427, 980, 1422, 1430})
[pytest : test] E        +      where {427, 980, 1422, 1430} = set([427, 1422, 980, 1430])

Expected Behavior

pass

Steps To Reproduce

No response

Milvus Log

https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI(new)/detail/2.4/21/pipeline/118/

case:
test_hybrid_search_overall_limit_larger_sum_each_limit
test_hybrid_search_min_limit

Anything else?

No response

@NicoYuan1986 NicoYuan1986 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 14, 2024
@NicoYuan1986 NicoYuan1986 added this to the 2.4.12 milestone Sep 14, 2024
@NicoYuan1986
Copy link
Contributor Author

@binbinlv Can you help look at the error? it is case error or kernel error ?

@binbinlv
Copy link
Contributor

binbinlv commented Sep 14, 2024

@binbinlv Can you help look at the error? it is case error or kernel error ?

Here I think is a bug, the hybrid search returns more than the sum of limit of each search request.

here the limit of search request is 2, and 2 search requests, so the sum limit is 4, even if the limit of hybrid search is 10, it should only return 4.

@binbinlv
Copy link
Contributor

@czs007

could you please have a look here please? Thanks.

/assign @czs007

@binbinlv binbinlv added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 14, 2024
@czs007
Copy link
Collaborator

czs007 commented Sep 14, 2024

working

@yanliang567
Copy link
Contributor

/assign @NicoYuan1986
please help to verify the fix above.

@NicoYuan1986
Copy link
Contributor Author

fixed. master-20241106-8275e40f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants