You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Inserting vectors is extremely slow when using non-contiguous keys (Python SDK).
Steps to reproduce
Run this code and it will test the index insertion for contiguous and non-contiguous keys:
from usearch.index import Index
from random import random
import numpy as np
vectors = np.random.rand(600000, 256)
keys = np.arange(len(vectors))
offset = 1_000_000
keys_non_contiguous = []
for u in range(0, len(vectors), 50000):
fileIndex = int(random()*10)
batch = int(random()*256)
batchIndex = int('0b' + bin(batch).removeprefix('0b').zfill(8) + '0'*32, 2)
keys_non_contiguous.extend([batchIndex + fileIndex * offset + u for u in range(50000)])
keys_non_contiguous = np.array(keys_non_contiguous)
index = Index(
ndim=256, # Define the number of dimensions in input vectors
metric='cos', # Choose 'l2sq', 'haversine' or other metric, default = 'ip'
dtype='f32', # Quantize to 'f16' or 'i8' if needed, default = 'f32'
connectivity=16, # How frequent should the connections in the graph be, optional
expansion_add=128, # Control the recall of indexing, optional
expansion_search=64 # Control the quality of search, optional
)
# This takes about 20 sec on a 32 vCPU machine
index.add(keys, vectors, log=True, copy=False)
index.clear()
# This takes about 1min15sec on a 32 vCPU machine
index.add(keys_non_contiguous, vectors, log=True, copy=False)
Expected behavior
Performance should match whether contiguous or non-contiguous keys.
USearch version
Build from source branch main-dev
Operating System
Ubuntu 24.04 LTS
Hardware architecture
x86
Which interface are you using?
Python bindings
Contact Details
No response
Are you open to being tagged as a contributor?
I am open to being mentioned in the project .git history as a contributor
Is there an existing issue for this?
I have searched the existing issues
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
mz1979
changed the title
Bug: Slow index add performances when keys are not contigious
Bug: Slow index add performances when keys are not contiguous
May 20, 2024
Describe the bug
Inserting vectors is extremely slow when using non-contiguous keys (Python SDK).
Steps to reproduce
Run this code and it will test the index insertion for contiguous and non-contiguous keys:
Expected behavior
Performance should match whether contiguous or non-contiguous keys.
USearch version
Build from source branch main-dev
Operating System
Ubuntu 24.04 LTS
Hardware architecture
x86
Which interface are you using?
Python bindings
Contact Details
No response
Are you open to being tagged as a contributor?
.git
history as a contributorIs there an existing issue for this?
Code of Conduct
The text was updated successfully, but these errors were encountered: