-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OOM]: Runs out of memory when creating index for large documents #15013
Comments
Hey there, @bibekyess! I'm here to help you with any bugs, questions, or contributions you might have. Let's work together to solve this issue. To manage memory usage more efficiently when creating an index for large documents using
Here is a modified version of your code to achieve this: from qdrant_client import QdrantClient, models
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
import os
from pathlib import Path
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext
from typing import List
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-m3")
embed_size = 1024 # Change as per your embed_model
# Local
client_path = os.path.join(Path.home(), ".cache", "aliparse", "qdrant_vector_store_del")
qdrant_client = QdrantClient(path=client_path)
def create_index(input_files: List = None, input_dir: List = None, collection_name: str = "test"):
qdrant_client.create_collection(
collection_name=collection_name,
on_disk_payload=True, # Ensure payloads are stored on disk
vectors_config=models.VectorParams(
size=embed_size,
distance=models.Distance.COSINE,
on_disk=True # Store vectors on disk
),
optimizers_config=models.OptimizersConfigDiff(memmap_threshold=10000),
hnsw_config=models.HnswConfigDiff(on_disk=False), # False to avoid Request Timeout Error
)
qdrant_vector_store = QdrantVectorStore(
client=qdrant_client,
collection_name=collection_name,
enable_hybrid=False, # Disable hybrid search to save memory
)
storage_context = StorageContext.from_defaults(vector_store=qdrant_vector_store)
if input_files is not None:
reader = SimpleDirectoryReader(input_files=input_files, raise_on_error=True)
elif input_dir is not None:
reader = SimpleDirectoryReader(input_dir=input_dir, raise_on_error=True)
else:
raise ValueError("Please either pass 'input_files' (a list of file paths) or 'input_dir' (directory containing test files) as an argument to load_index")
qdrant_index = VectorStoreIndex.from_documents([], embed_model=embed_model, storage_context=storage_context)
for docs in reader.iter_data():
for d in docs:
qdrant_index.insert(d)
if len(docs) > 0:
try:
print(f"Successfully inserted index for {docs[-1].metadata['file_name']}")
except:
pass
return qdrant_index
qdrant_index = create_index(input_dir="/test-directory", collection_name="test") This setup ensures that vectors are stored on disk, reducing the memory footprint during the indexing process. Additionally, you can adjust parameters like |
@dosu It doesn't solve the issue. |
To resolve the out-of-memory (OOM) issue when creating an index for large documents using Here is an example of how to initialize the from qdrant_client import QdrantClient, models
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
import os
from pathlib import Path
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext
from typing import List
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-m3")
embed_size = 1024 # Change as per your embed_model
# Local
client_path = os.path.join(Path.home(), ".cache", "aliparse", "qdrant_vector_store_del")
qdrant_client = QdrantClient(path=client_path)
def create_index(input_files: List = None, input_dir: List = None, collection_name: str = "test"):
qdrant_client.create_collection(
collection_name=collection_name,
on_disk_payload=True, # Ensure payloads are stored on disk
vectors_config=models.VectorParams(
size=embed_size,
distance=models.Distance.COSINE,
on_disk=True # Store vectors on disk
),
optimizers_config=models.OptimizersConfigDiff(memmap_threshold=10000),
hnsw_config=models.HnswConfigDiff(on_disk=False), # False to avoid Request Timeout Error
)
qdrant_vector_store = QdrantVectorStore(
client=qdrant_client,
collection_name=collection_name,
batch_size=16, # Adjust batch_size to a smaller value to reduce memory usage
parallel=1,
max_retries=3,
enable_hybrid=False, # Disable hybrid search to save memory
)
storage_context = StorageContext.from_defaults(vector_store=qdrant_vector_store)
if input_files is not None:
reader = SimpleDirectoryReader(input_files=input_files, raise_on_error=True)
elif input_dir is not None:
reader = SimpleDirectoryReader(input_dir=input_dir, raise_on_error=True)
else:
raise ValueError("Please either pass 'input_files' (a list of file paths) or 'input_dir' (directory containing test files) as an argument to load_index")
qdrant_index = VectorStoreIndex.from_documents([], embed_model=embed_model, storage_context=storage_context)
for docs in reader.iter_data():
for d in docs:
qdrant_index.insert(d)
if len(docs) > 0:
try:
print(f"Successfully inserted index for {docs[-1].metadata['file_name']}")
except:
pass
return qdrant_index
qdrant_index = create_index(input_dir="/test-directory", collection_name="test") By reducing the |
There's probably some setting for qdrant. I usually use qdrants docker server though. Also keep in mind bge-m3 is a pretty big model. I'm not sure if you mean ram or vram, but you might want to lower the batch size |
@logan-markewich
|
Bug Description
How to use llamaindex to create index for a large documents?
I noticed that when I am doing
index.insert
, theindex
variable is stored in RAM, so it increases RAM usage as I keep on adding new document chunks and eventually causes out of memory. Is there some way to reload index and store only vectors on memory and keep other payload/metadata on disk?Version
0.10.57
Steps to Reproduce
Relevant Logs/Tracbacks
No response
The text was updated successfully, but these errors were encountered: