[OOM]: Runs out of memory when creating index for large documents #15013

bibekyess · 2024-07-29T08:09:35Z

Bug Description

How to use llamaindex to create index for a large documents?
I noticed that when I am doing index.insert, the index variable is stored in RAM, so it increases RAM usage as I keep on adding new document chunks and eventually causes out of memory. Is there some way to reload index and store only vectors on memory and keep other payload/metadata on disk?

Version

0.10.57

Steps to Reproduce

# llama-index==0.10.57
# qdrant-client==1.10.1

from qdrant_client import QdrantClient, models
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
import os
from pathlib import Path
from qdrant_client import QdrantClient
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import SimpleDirectoryReader
from llama_index.core import VectorStoreIndex, StorageContext
from typing import List

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-m3")
embed_size = 1024 # Change as per your embed_model

# Local
client_path = os.path.join(Path.home(), ".cache", "aliparse", "qdrant_vector_store_del")
qdrant_client = QdrantClient(path=client_path)

# Remote
# url = "http://0.0.0.0:6333"
# qdrant_client = QdrantClient(url=url, timeout=20)


def create_index(input_files: List=None, input_dir: List=None, collection_name: str="test"):

    qdrant_client.create_collection(
        collection_name=collection_name,
        on_disk_payload= True, # TODO Seems not working
        vectors_config= models.VectorParams(
                size=embed_size,
                distance=models.Distance.COSINE,
                on_disk=True
            ),
        optimizers_config=models.OptimizersConfigDiff(memmap_threshold=10000),
        hnsw_config=models.HnswConfigDiff(on_disk=False), # False to avoid Request Timeout Error, https://github.com/qdrant/qdrant-client/issues/632#issuecomment-2126729153
    )

    qdrant_vector_store = QdrantVectorStore(
        client=qdrant_client, 
        collection_name=collection_name, 
        enable_hybrid=False, #  whether to enable hybrid search using dense and sparse vectors
    )

    storage_context = StorageContext.from_defaults(vector_store=qdrant_vector_store)

    if input_files is not None:
        reader = SimpleDirectoryReader(input_files=input_files, raise_on_error=True)
    elif input_dir is not None:
        # Careful: Qdrant gives weird errors if you pass filename_as_id as True
        reader = SimpleDirectoryReader(input_dir=input_dir, raise_on_error=True)
    else:
        raise ValueError("Please either pass 'input_files' (a list of file paths) or 'input_dir' (directory containing test files) as an argument to load_index")

    qdrant_index = VectorStoreIndex.from_documents([], embed_model=embed_model, storage_context=storage_context)
    for docs in reader.iter_data():
        for d in docs:
            qdrant_index.insert(d)
        if len(docs) > 0:
            try:
                print(f"Succesfully inserted index for {docs[-1].metadata['file_name']}")
            except:
                pass
    
    return qdrant_index
        
qdrant_index = create_index(input_dir="/test-directory", collection_name="test")

Relevant Logs/Tracbacks

No response

The text was updated successfully, but these errors were encountered:

dosubot · 2024-07-29T08:19:29Z

Hey there, @bibekyess! I'm here to help you with any bugs, questions, or contributions you might have. Let's work together to solve this issue.

To manage memory usage more efficiently when creating an index for large documents using llamaindex, you can store only vectors in memory and keep other payloads/metadata on disk. Here is an example of how to achieve this using QdrantVectorStore:

Initialize Qdrant Client and Create Collection: Ensure that the collection is configured to store payloads on disk.
Configure QdrantVectorStore: Set up the vector store to use the Qdrant client.
Create and Use the Index: Use the VectorStoreIndex to insert documents into the index.

Here is a modified version of your code to achieve this:

from qdrant_client import QdrantClient, models
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
import os
from pathlib import Path
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext
from typing import List

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-m3")
embed_size = 1024  # Change as per your embed_model

# Local
client_path = os.path.join(Path.home(), ".cache", "aliparse", "qdrant_vector_store_del")
qdrant_client = QdrantClient(path=client_path)

def create_index(input_files: List = None, input_dir: List = None, collection_name: str = "test"):
    qdrant_client.create_collection(
        collection_name=collection_name,
        on_disk_payload=True,  # Ensure payloads are stored on disk
        vectors_config=models.VectorParams(
            size=embed_size,
            distance=models.Distance.COSINE,
            on_disk=True  # Store vectors on disk
        ),
        optimizers_config=models.OptimizersConfigDiff(memmap_threshold=10000),
        hnsw_config=models.HnswConfigDiff(on_disk=False),  # False to avoid Request Timeout Error
    )

    qdrant_vector_store = QdrantVectorStore(
        client=qdrant_client,
        collection_name=collection_name,
        enable_hybrid=False,  # Disable hybrid search to save memory
    )

    storage_context = StorageContext.from_defaults(vector_store=qdrant_vector_store)

    if input_files is not None:
        reader = SimpleDirectoryReader(input_files=input_files, raise_on_error=True)
    elif input_dir is not None:
        reader = SimpleDirectoryReader(input_dir=input_dir, raise_on_error=True)
    else:
        raise ValueError("Please either pass 'input_files' (a list of file paths) or 'input_dir' (directory containing test files) as an argument to load_index")

    qdrant_index = VectorStoreIndex.from_documents([], embed_model=embed_model, storage_context=storage_context)
    for docs in reader.iter_data():
        for d in docs:
            qdrant_index.insert(d)
        if len(docs) > 0:
            try:
                print(f"Successfully inserted index for {docs[-1].metadata['file_name']}")
            except:
                pass

    return qdrant_index

qdrant_index = create_index(input_dir="/test-directory", collection_name="test")

This setup ensures that vectors are stored on disk, reducing the memory footprint during the indexing process. Additionally, you can adjust parameters like batch_size and parallel in QdrantVectorStore to further optimize memory usage ^[1]^[2].

To continue talking to Dosu, mention @dosu.

bibekyess · 2024-07-29T08:54:36Z

@dosu It doesn't solve the issue.

dosubot · 2024-07-29T08:57:45Z

To resolve the out-of-memory (OOM) issue when creating an index for large documents using llamaindex and QdrantVectorStore, you can adjust the batch_size parameter to control the number of points uploaded in a single request. This will help manage memory usage more efficiently. Additionally, you can ensure that only vectors are stored in memory while keeping other payloads/metadata on disk.

Here is an example of how to initialize the QdrantVectorStore with a smaller batch_size to reduce memory usage:

from qdrant_client import QdrantClient, models
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
import os
from pathlib import Path
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext
from typing import List

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-m3")
embed_size = 1024  # Change as per your embed_model

# Local
client_path = os.path.join(Path.home(), ".cache", "aliparse", "qdrant_vector_store_del")
qdrant_client = QdrantClient(path=client_path)

def create_index(input_files: List = None, input_dir: List = None, collection_name: str = "test"):
    qdrant_client.create_collection(
        collection_name=collection_name,
        on_disk_payload=True,  # Ensure payloads are stored on disk
        vectors_config=models.VectorParams(
            size=embed_size,
            distance=models.Distance.COSINE,
            on_disk=True  # Store vectors on disk
        ),
        optimizers_config=models.OptimizersConfigDiff(memmap_threshold=10000),
        hnsw_config=models.HnswConfigDiff(on_disk=False),  # False to avoid Request Timeout Error
    )

    qdrant_vector_store = QdrantVectorStore(
        client=qdrant_client,
        collection_name=collection_name,
        batch_size=16,  # Adjust batch_size to a smaller value to reduce memory usage
        parallel=1,
        max_retries=3,
        enable_hybrid=False,  # Disable hybrid search to save memory
    )

    storage_context = StorageContext.from_defaults(vector_store=qdrant_vector_store)

    if input_files is not None:
        reader = SimpleDirectoryReader(input_files=input_files, raise_on_error=True)
    elif input_dir is not None:
        reader = SimpleDirectoryReader(input_dir=input_dir, raise_on_error=True)
    else:
        raise ValueError("Please either pass 'input_files' (a list of file paths) or 'input_dir' (directory containing test files) as an argument to load_index")

    qdrant_index = VectorStoreIndex.from_documents([], embed_model=embed_model, storage_context=storage_context)
    for docs in reader.iter_data():
        for d in docs:
            qdrant_index.insert(d)
        if len(docs) > 0:
            try:
                print(f"Successfully inserted index for {docs[-1].metadata['file_name']}")
            except:
                pass

    return qdrant_index

qdrant_index = create_index(input_dir="/test-directory", collection_name="test")

By reducing the batch_size, you can control the memory footprint during the upload process, which should help mitigate OOM issues ^[1].

To continue talking to Dosu, mention @dosu.

logan-markewich · 2024-07-29T19:59:19Z

There's probably some setting for qdrant. I usually use qdrants docker server though.

Also keep in mind bge-m3 is a pretty big model. I'm not sure if you mean ram or vram, but you might want to lower the batch size
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-m3", embed_batch_size=2)

bibekyess · 2024-07-30T01:53:01Z

@logan-markewich
Thank you for your response.

Yes, the replication code I sent already considers the qdrant config for on-disk storage, but it is strange that the memory is increasing.
I mean RAM, I think embed_batch_size will affect the vram and I have no issue with vram usage as after creating index, vram usage also decreases. But using the above code, RAM usage doesn't decrease even if all the indexes are all created. I need to kill the entire process to free the memory. I have attached the picture. Left one is qdrant docker statistics and right one is RAM memory usage

bibekyess added bug Something isn't working triage Issue needs to be triaged/prioritized labels Jul 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OOM]: Runs out of memory when creating index for large documents #15013

[OOM]: Runs out of memory when creating index for large documents #15013

bibekyess commented Jul 29, 2024

dosubot bot commented Jul 29, 2024

bibekyess commented Jul 29, 2024

dosubot bot commented Jul 29, 2024

logan-markewich commented Jul 29, 2024

bibekyess commented Jul 30, 2024 •

edited

Loading

[OOM]: Runs out of memory when creating index for large documents #15013

[OOM]: Runs out of memory when creating index for large documents #15013

Comments

bibekyess commented Jul 29, 2024

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks

dosubot bot commented Jul 29, 2024

bibekyess commented Jul 29, 2024

dosubot bot commented Jul 29, 2024

logan-markewich commented Jul 29, 2024

bibekyess commented Jul 30, 2024 • edited Loading

bibekyess commented Jul 30, 2024 •

edited

Loading