Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: query failed: Assert "index < this->counter_" => index out of range, index=0, counter_=0 #36871

Open
1 task done
ThreadDao opened this issue Oct 15, 2024 · 7 comments
Assignees
Labels
kind/bug Issues or changes related a bug severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@ThreadDao
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 2.4-20241013-44564f04-amd64
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

milvus server

deploy a cluster with image 2.4-20241013-44564f04-amd64

  • with config
      config:
        queryNode:
          levelZeroForwardPolicy: RemoteLoad
        rootCoord:
          enableActiveStandby: true
        queryCoord:
          enableActiveStandby: true
        indexCoord:
          enableActiveStandby: true
        dataCoord:
          enableActiveStandby: true
          segment:
            enableLevelZero: true
        log:
          level: debug
        trace:
          exporter: jaeger
          sampleFraction: 1
          jaeger:
            url: http://tempo-distributor.tempo:14268/api/traces

test steps

  1. create a collection with a int64-pk field and a 128dim vector field
  2. create hnsw index
  3. insert 10m entities with random pk values
  4. flush
  5. index again -> load
  6. concurrent search + query + upsert
'concurrent_params': {'concurrent_number': 30, 'during_time': '5h', 'interval': 60, 'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'search',
                                                       'weight': 15,
                                                       'params': {'nq': 100,
                                                                  'top_k': 100,
                                                                  'search_param': {'ef': 120},
                                                                  'expr': 'id >= 10',
                                                                  'guarantee_timestamp': None,
                                                                  'partition_names': None,
                                                                  'output_fields': ['id'],
                                                                  'ignore_growing': False,
                                                                  'group_by_field': None,
                                                                  'timeout': 120,
                                                                  'random_data': True,
                                                                  'check_task': 'check_search_output',
                                                                  'check_items': {'nq': 100}}},
                                                      {'type': 'query',
                                                       'weight': 10,
                                                       'params': {'ids': None,
                                                                  'expr': '',
                                                                  'output_fields': None,
                                                                  'offset': None,
                                                                  'limit': None,
                                                                  'ignore_growing': False,
                                                                  'partition_names': None,
                                                                  'timeout': 120,
                                                                  'consistency_level': None,
                                                                  'random_data': True,
                                                                  'random_count': 500,
                                                                  'random_range': [0, 20000000],
                                                                  'field_name': 'id',
                                                                  'field_type': 'int64',
                                                                  'check_task': 'check_response',
                                                                  'check_items': None}},
                                                      {'type': 'upsert',
                                                       'weight': 5,
                                                       'params': {'nb': 300,
                                                                  'timeout': 60,
                                                                  'random_id': True,
                                                                  'random_vector': True,
                                                                  'varchar_filled': False,
                                                                  'start_id': 0,
                                                                  'shuffle_id': True,
                                                                  'check_task': 'check_response',
                                                                  'check_items': None}}]},

results

Only one search requests failed: with query failed: Assert "index < this->counter_" => index out of range, index=0, counter_=0 at /workspace/source/internal/core/src/mmap/ChunkVector.h:12

[2024-10-14 12:57:08,548 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=65535, message=fail to search on QueryNode 2: worker(2) query failed: Assert "index < this->counter_"  => index out of range, index=0, counter_=0 at /workspace/source/internal/core/src/mmap/ChunkVector.h:126
)>, <Time:{'RPC start': '2024-10-14 12:57:07.696279', 'RPC error': '2024-10-14 12:57:08.547992'}> (decorators.py:147)
[2024-10-14 12:57:08,549 - ERROR - fouram]: (api_response) : [Collection.search] <MilvusException: (code=65535, message=fail to search on QueryNode 2: worker(2) query failed: Assert "index < this->counter_"  => index out of range, index=0, counter_=0 at /workspace/source/internal/core/src/mmap/ChunkVector.h:126
)>, [requestId: d19f9322-8a2b-11ef-9efb-caae83df8f3a] (api_request.py:57)
[2024-10-14 12:57:08,549 - ERROR - fouram]: [CheckFunc] search request check failed, response:<MilvusException: (code=65535, message=fail to search on QueryNode 2: worker(2) query failed: Assert "index < this->counter_"  => index out of range, index=0, counter_=0 at /workspace/source/internal/core/src/mmap/ChunkVector.h:126
)> (func_check.py:106)

Expected Behavior

No response

Steps To Reproduce

- [argo workflow](https://argo-workflows.zilliz.cc/archived-workflows/qa/be3623e6-1769-4fdb-8585-3411613ca407?nodeId=zong-rolling-upgrade-all-hjwcd-205957643)

Milvus Log

pods:

zong-rolling-upgrade-all-hjwcd-milvus-datanode-6864db75b5-6vkdc     Running     0            1m      10.104.34.9       4am-node37     
zong-rolling-upgrade-all-hjwcd-milvus-datanode-6864db75b5-wp9sp     Running     0            1m      10.104.25.168     4am-node30     
zong-rolling-upgrade-all-hjwcd-milvus-indexnode-75d6c7f47-p5kjx     Running     0            1m      10.104.21.250     4am-node24     
zong-rolling-upgrade-all-hjwcd-milvus-indexnode-75d6c7f47-ts6ch     Running     0            1m      10.104.6.34       4am-node13     
zong-rolling-upgrade-all-hjwcd-milvus-mixcoord-579dcdd8fc-lrnn5     Running     0            1m      10.104.25.167     4am-node30     
zong-rolling-upgrade-all-hjwcd-milvus-proxy-b799c74d5-gbkgt         Running     0            1m      10.104.5.49       4am-node12     
zong-rolling-upgrade-all-hjwcd-milvus-querynode-0-55f689648nkv9     Running     0            1m      10.104.34.10      4am-node37     
zong-rolling-upgrade-all-hjwcd-milvus-querynode-0-55f68964ptv29     Running     0            1m      10.104.30.29      4am-node38     
zong-rolling-upgrade-all-hjwcd-milvus-querynode-0-55f68964q7b9v     Running     0            1m      10.104.5.50       4am-node12

Anything else?

No response

@ThreadDao ThreadDao added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 15, 2024
@ThreadDao ThreadDao added the severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. label Oct 15, 2024
@ThreadDao ThreadDao added this to the 2.4.13 milestone Oct 15, 2024
@yanliang567 yanliang567 added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 15, 2024
@yanliang567 yanliang567 modified the milestones: 2.4.13, 2.4.14 Oct 15, 2024
@yanliang567
Copy link
Contributor

/assign @weiliu1031
/unassign

@ThreadDao
Copy link
Contributor Author

/assign @cqy123456
/unassign @weiliu1031

@sre-ci-robot sre-ci-robot assigned cqy123456 and unassigned weiliu1031 Oct 15, 2024
@xiaofan-luan
Copy link
Collaborator

/assign @sunby

@xiaofan-luan
Copy link
Collaborator

/assign @cqy123456

@xiaofan-luan
Copy link
Collaborator

this seems to be an growing mmap issue

@cqy123456
Copy link
Contributor

Growing insert is not locked. During the insert process, the vector chunks will be cleared (chunk number counter = 0)after the growing index is built. And the growing segment uses indexing_record_.SyncDataWithIndex to get whether the growing index has been successfully built.
img_v3_02fn_9d7b2423-21fd-4ab9-af93-1043b3e8d6bg
From the log, it can be seen that the chunks are cleared and the vector BF search at the same time, and there is a consistency problem in the access of indexing_record_.SyncDataWithIndex.
SyncDataWithIndex = fasle ->jump to the BF search logic->SyncDataWithIndex = true -> try_remove_chunks -> BF search.

sre-ci-robot pushed a commit that referenced this issue Oct 17, 2024
… index removes the vec chunks. (#36938)

issue: #36871
related pr: #36939

Signed-off-by: cqy123456 <[email protected]>
sre-ci-robot pushed a commit that referenced this issue Oct 18, 2024
…x removes the vec chunks. (#36939)

issue: #36871
related pr: #36938

Signed-off-by: cqy123456 <[email protected]>
@ThreadDao ThreadDao removed the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Oct 23, 2024
@yanliang567
Copy link
Contributor

@cqy123456 @ThreadDao any updates for this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

6 participants