[Bug]: [Nightly] Hybrid search results are different from expected #36273

NicoYuan1986 · 2024-09-14T02:38:55Z

Is there an existing issue for this?

I have searched the existing issues

Environment

- Milvus version: 34e5f99
- Deployment mode(standalone or cluster):cluster & standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Hybrid search results are different from expected.

[pytest : test] ____ TestCollectionHybridSearchValid.test_hybrid_search_min_limit[IP-int64] ____
[pytest : test] [gw4] linux -- Python 3.8.17 /usr/local/bin/python3
[pytest : test] 
[pytest : test] self = <test_search.TestCollectionHybridSearchValid object at 0x7fd4b73f7820>
[pytest : test] primary_field = 'int64', metric_type = 'IP'
[pytest : test] 
[pytest : test]     @pytest.mark.tags(CaseLabel.L2)
[pytest : test]     @pytest.mark.parametrize("primary_field", [ct.default_int64_field_name, ct.default_string_field_name])
[pytest : test]     def test_hybrid_search_min_limit(self, primary_field, metric_type):
[pytest : test]         """
[pytest : test]         target: test hybrid search with minimum limit params
[pytest : test]         method: create connection, collection, insert and search
[pytest : test]         expected: hybrid search successfully with limit(topK)
[pytest : test]         """
[pytest : test]         # 1. initialize collection with data
[pytest : test]         dim = 99
[pytest : test]         multiple_dim_array = [dim + dim, dim - 10]
[pytest : test]         collection_w, _, _, insert_ids, time_stamp = \
[pytest : test]             self.init_collection_general(prefix, True, dim=dim, is_index=False, primary_field=primary_field,
[pytest : test]                                          enable_dynamic_field=False, multiple_dim_array=multiple_dim_array)[0:5]
[pytest : test]         # 2. extract vector field name
[pytest : test]         vector_name_list = cf.extract_vector_field_name_list(collection_w)
[pytest : test]         flat_index = {"index_type": "FLAT", "params": {}, "metric_type": metric_type}
[pytest : test]         for vector_name in vector_name_list:
[pytest : test]             collection_w.create_index(vector_name, flat_index)
[pytest : test]         collection_w.create_index(ct.default_float_vec_field_name, flat_index)
[pytest : test]         collection_w.load()
[pytest : test]         # 3. prepare search params
[pytest : test]         req_list = []
[pytest : test]         id_list = []
[pytest : test]         for i in range(len(vector_name_list)):
[pytest : test]             vectors = [[random.random() for _ in range(multiple_dim_array[i])] for _ in range(1)]
[pytest : test]             search_params = {"metric_type": metric_type, "offset": 0}
[pytest : test]             search_param = {
[pytest : test]                 "data": vectors,
[pytest : test]                 "anns_field": vector_name_list[i],
[pytest : test]                 "param": search_params,
[pytest : test]                 "limit": min_dim,
[pytest : test]                 "expr": default_search_exp}
[pytest : test]             req = AnnSearchRequest(**search_param)
[pytest : test]             req_list.append(req)
[pytest : test]             search_res = collection_w.search(vectors[:1], vector_name_list[i],
[pytest : test]                                              search_params, min_dim,
[pytest : test]                                              default_search_exp,
[pytest : test]                                              check_task=CheckTasks.check_search_results,
[pytest : test]                                              check_items={"nq": 1,
[pytest : test]                                                           "ids": insert_ids,
[pytest : test]                                                           "limit": min_dim})[0]
[pytest : test]             id_list.extend(search_res[0].ids)
[pytest : test]         # 4. hybrid search
[pytest : test]         hybrid_search = collection_w.hybrid_search(req_list, WeightedRanker(0.1, 0.9), default_limit)[0]
[pytest : test]         assert len(hybrid_search) == 1
[pytest : test] >       assert len(hybrid_search[0].ids) == len(list(set(id_list)))
[pytest : test] E       AssertionError: assert 10 == 4
[pytest : test] E        +  where 10 = len([980, 1430, 1240, 1247, 1360, 1516, ...])
[pytest : test] E        +    where [980, 1430, 1240, 1247, 1360, 1516, ...] = ['id: 980, distance: 0.8404409289360046, entity: {}', 'id: 1430, distance: 0.83946293592453, entity: {}', 'id: 1240, distance: 0.8389506936073303, entity: {}', 'id: 1247, distance: 0.8389191627502441, entity: {}', 'id: 1360, distance: 0.8387582898139954, entity: {}', 'id: 1516, distance: 0.8387090563774109, entity: {}', 'id: 1932, distance: 0.8386892676353455, entity: {}', 'id: 184, distance: 0.8385480642318726, entity: {}', 'id: 1825, distance: 0.8385159969329834, entity: {}', 'id: 653, distance: 0.8384677171707153, entity: {}'].ids
[pytest : test] E        +  and   4 = len([1430, 427, 980, 1422])
[pytest : test] E        +    where [1430, 427, 980, 1422] = list({427, 980, 1422, 1430})
[pytest : test] E        +      where {427, 980, 1422, 1430} = set([427, 1422, 980, 1430])

Expected Behavior

pass

Steps To Reproduce

No response

Milvus Log

https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI(new)/detail/2.4/21/pipeline/118/

case:
test_hybrid_search_overall_limit_larger_sum_each_limit
test_hybrid_search_min_limit

Anything else?

No response

The text was updated successfully, but these errors were encountered:

NicoYuan1986 · 2024-09-14T02:44:21Z

@binbinlv Can you help look at the error? it is case error or kernel error ?

binbinlv · 2024-09-14T03:34:02Z

@binbinlv Can you help look at the error? it is case error or kernel error ?

Here I think is a bug, the hybrid search returns more than the sum of limit of each search request.

here the limit of search request is 2, and 2 search requests, so the sum limit is 4, even if the limit of hybrid search is 10, it should only return 4.

binbinlv · 2024-09-14T03:34:34Z

@czs007

could you please have a look here please? Thanks.

/assign @czs007

czs007 · 2024-09-14T06:57:45Z

working

yanliang567 · 2024-10-09T02:12:32Z

/assign @NicoYuan1986
please help to verify the fix above.

NicoYuan1986 · 2024-11-07T03:07:01Z

fixed. master-20241106-8275e40f

NicoYuan1986 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 14, 2024

NicoYuan1986 added this to the 2.4.12 milestone Sep 14, 2024

NicoYuan1986 assigned yanliang567 Sep 14, 2024

sre-ci-robot assigned czs007 Sep 14, 2024

binbinlv unassigned yanliang567 Sep 14, 2024

binbinlv added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 14, 2024

This was referenced Sep 14, 2024

fix: keep inner topK to avoid exceeding efSearch #36287

Merged

fix: keep inner topK to avoid exceeding efSearch #36284

Merged

yanliang567 modified the milestones: 2.4.12, 2.4.13 Sep 27, 2024

sre-ci-robot assigned NicoYuan1986 Oct 9, 2024

yanliang567 modified the milestones: 2.4.13, 2.4.14 Oct 15, 2024

NicoYuan1986 closed this as completed Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: [Nightly] Hybrid search results are different from expected #36273

[Bug]: [Nightly] Hybrid search results are different from expected #36273

NicoYuan1986 commented Sep 14, 2024

NicoYuan1986 commented Sep 14, 2024

binbinlv commented Sep 14, 2024 •

edited

Loading

binbinlv commented Sep 14, 2024

czs007 commented Sep 14, 2024

yanliang567 commented Oct 9, 2024

NicoYuan1986 commented Nov 7, 2024

[Bug]: [Nightly] Hybrid search results are different from expected #36273

[Bug]: [Nightly] Hybrid search results are different from expected #36273

Comments

NicoYuan1986 commented Sep 14, 2024

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

NicoYuan1986 commented Sep 14, 2024

binbinlv commented Sep 14, 2024 • edited Loading

binbinlv commented Sep 14, 2024

czs007 commented Sep 14, 2024

yanliang567 commented Oct 9, 2024

NicoYuan1986 commented Nov 7, 2024

binbinlv commented Sep 14, 2024 •

edited

Loading