Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OOM]: Runs out of memory when creating index for large documents #15013

Open
bibekyess opened this issue Jul 29, 2024 · 5 comments
Open

[OOM]: Runs out of memory when creating index for large documents #15013

bibekyess opened this issue Jul 29, 2024 · 5 comments
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized

Comments

@bibekyess
Copy link

Bug Description

How to use llamaindex to create index for a large documents?
I noticed that when I am doing index.insert, the index variable is stored in RAM, so it increases RAM usage as I keep on adding new document chunks and eventually causes out of memory. Is there some way to reload index and store only vectors on memory and keep other payload/metadata on disk?

Version

0.10.57

Steps to Reproduce

# llama-index==0.10.57
# qdrant-client==1.10.1

from qdrant_client import QdrantClient, models
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
import os
from pathlib import Path
from qdrant_client import QdrantClient
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import SimpleDirectoryReader
from llama_index.core import VectorStoreIndex, StorageContext
from typing import List

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-m3")
embed_size = 1024 # Change as per your embed_model

# Local
client_path = os.path.join(Path.home(), ".cache", "aliparse", "qdrant_vector_store_del")
qdrant_client = QdrantClient(path=client_path)

# Remote
# url = "http://0.0.0.0:6333"
# qdrant_client = QdrantClient(url=url, timeout=20)


def create_index(input_files: List=None, input_dir: List=None, collection_name: str="test"):

    qdrant_client.create_collection(
        collection_name=collection_name,
        on_disk_payload= True, # TODO Seems not working
        vectors_config= models.VectorParams(
                size=embed_size,
                distance=models.Distance.COSINE,
                on_disk=True
            ),
        optimizers_config=models.OptimizersConfigDiff(memmap_threshold=10000),
        hnsw_config=models.HnswConfigDiff(on_disk=False), # False to avoid Request Timeout Error, https://github.com/qdrant/qdrant-client/issues/632#issuecomment-2126729153
    )

    qdrant_vector_store = QdrantVectorStore(
        client=qdrant_client, 
        collection_name=collection_name, 
        enable_hybrid=False, #  whether to enable hybrid search using dense and sparse vectors
    )

    storage_context = StorageContext.from_defaults(vector_store=qdrant_vector_store)

    if input_files is not None:
        reader = SimpleDirectoryReader(input_files=input_files, raise_on_error=True)
    elif input_dir is not None:
        # Careful: Qdrant gives weird errors if you pass filename_as_id as True
        reader = SimpleDirectoryReader(input_dir=input_dir, raise_on_error=True)
    else:
        raise ValueError("Please either pass 'input_files' (a list of file paths) or 'input_dir' (directory containing test files) as an argument to load_index")

    qdrant_index = VectorStoreIndex.from_documents([], embed_model=embed_model, storage_context=storage_context)
    for docs in reader.iter_data():
        for d in docs:
            qdrant_index.insert(d)
        if len(docs) > 0:
            try:
                print(f"Succesfully inserted index for {docs[-1].metadata['file_name']}")
            except:
                pass
    
    return qdrant_index
        
qdrant_index = create_index(input_dir="/test-directory", collection_name="test")

Relevant Logs/Tracbacks

No response

@bibekyess bibekyess added bug Something isn't working triage Issue needs to be triaged/prioritized labels Jul 29, 2024
Copy link

dosubot bot commented Jul 29, 2024

Hey there, @bibekyess! I'm here to help you with any bugs, questions, or contributions you might have. Let's work together to solve this issue.

To manage memory usage more efficiently when creating an index for large documents using llamaindex, you can store only vectors in memory and keep other payloads/metadata on disk. Here is an example of how to achieve this using QdrantVectorStore:

  1. Initialize Qdrant Client and Create Collection: Ensure that the collection is configured to store payloads on disk.

  2. Configure QdrantVectorStore: Set up the vector store to use the Qdrant client.

  3. Create and Use the Index: Use the VectorStoreIndex to insert documents into the index.

Here is a modified version of your code to achieve this:

from qdrant_client import QdrantClient, models
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
import os
from pathlib import Path
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext
from typing import List

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-m3")
embed_size = 1024  # Change as per your embed_model

# Local
client_path = os.path.join(Path.home(), ".cache", "aliparse", "qdrant_vector_store_del")
qdrant_client = QdrantClient(path=client_path)

def create_index(input_files: List = None, input_dir: List = None, collection_name: str = "test"):
    qdrant_client.create_collection(
        collection_name=collection_name,
        on_disk_payload=True,  # Ensure payloads are stored on disk
        vectors_config=models.VectorParams(
            size=embed_size,
            distance=models.Distance.COSINE,
            on_disk=True  # Store vectors on disk
        ),
        optimizers_config=models.OptimizersConfigDiff(memmap_threshold=10000),
        hnsw_config=models.HnswConfigDiff(on_disk=False),  # False to avoid Request Timeout Error
    )

    qdrant_vector_store = QdrantVectorStore(
        client=qdrant_client,
        collection_name=collection_name,
        enable_hybrid=False,  # Disable hybrid search to save memory
    )

    storage_context = StorageContext.from_defaults(vector_store=qdrant_vector_store)

    if input_files is not None:
        reader = SimpleDirectoryReader(input_files=input_files, raise_on_error=True)
    elif input_dir is not None:
        reader = SimpleDirectoryReader(input_dir=input_dir, raise_on_error=True)
    else:
        raise ValueError("Please either pass 'input_files' (a list of file paths) or 'input_dir' (directory containing test files) as an argument to load_index")

    qdrant_index = VectorStoreIndex.from_documents([], embed_model=embed_model, storage_context=storage_context)
    for docs in reader.iter_data():
        for d in docs:
            qdrant_index.insert(d)
        if len(docs) > 0:
            try:
                print(f"Successfully inserted index for {docs[-1].metadata['file_name']}")
            except:
                pass

    return qdrant_index

qdrant_index = create_index(input_dir="/test-directory", collection_name="test")

This setup ensures that vectors are stored on disk, reducing the memory footprint during the indexing process. Additionally, you can adjust parameters like batch_size and parallel in QdrantVectorStore to further optimize memory usage [1][2].

To continue talking to Dosu, mention @dosu.

@bibekyess
Copy link
Author

@dosu It doesn't solve the issue.

Copy link

dosubot bot commented Jul 29, 2024

To resolve the out-of-memory (OOM) issue when creating an index for large documents using llamaindex and QdrantVectorStore, you can adjust the batch_size parameter to control the number of points uploaded in a single request. This will help manage memory usage more efficiently. Additionally, you can ensure that only vectors are stored in memory while keeping other payloads/metadata on disk.

Here is an example of how to initialize the QdrantVectorStore with a smaller batch_size to reduce memory usage:

from qdrant_client import QdrantClient, models
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
import os
from pathlib import Path
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext
from typing import List

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-m3")
embed_size = 1024  # Change as per your embed_model

# Local
client_path = os.path.join(Path.home(), ".cache", "aliparse", "qdrant_vector_store_del")
qdrant_client = QdrantClient(path=client_path)

def create_index(input_files: List = None, input_dir: List = None, collection_name: str = "test"):
    qdrant_client.create_collection(
        collection_name=collection_name,
        on_disk_payload=True,  # Ensure payloads are stored on disk
        vectors_config=models.VectorParams(
            size=embed_size,
            distance=models.Distance.COSINE,
            on_disk=True  # Store vectors on disk
        ),
        optimizers_config=models.OptimizersConfigDiff(memmap_threshold=10000),
        hnsw_config=models.HnswConfigDiff(on_disk=False),  # False to avoid Request Timeout Error
    )

    qdrant_vector_store = QdrantVectorStore(
        client=qdrant_client,
        collection_name=collection_name,
        batch_size=16,  # Adjust batch_size to a smaller value to reduce memory usage
        parallel=1,
        max_retries=3,
        enable_hybrid=False,  # Disable hybrid search to save memory
    )

    storage_context = StorageContext.from_defaults(vector_store=qdrant_vector_store)

    if input_files is not None:
        reader = SimpleDirectoryReader(input_files=input_files, raise_on_error=True)
    elif input_dir is not None:
        reader = SimpleDirectoryReader(input_dir=input_dir, raise_on_error=True)
    else:
        raise ValueError("Please either pass 'input_files' (a list of file paths) or 'input_dir' (directory containing test files) as an argument to load_index")

    qdrant_index = VectorStoreIndex.from_documents([], embed_model=embed_model, storage_context=storage_context)
    for docs in reader.iter_data():
        for d in docs:
            qdrant_index.insert(d)
        if len(docs) > 0:
            try:
                print(f"Successfully inserted index for {docs[-1].metadata['file_name']}")
            except:
                pass

    return qdrant_index

qdrant_index = create_index(input_dir="/test-directory", collection_name="test")

By reducing the batch_size, you can control the memory footprint during the upload process, which should help mitigate OOM issues [1].

To continue talking to Dosu, mention @dosu.

@logan-markewich
Copy link
Collaborator

There's probably some setting for qdrant. I usually use qdrants docker server though.

Also keep in mind bge-m3 is a pretty big model. I'm not sure if you mean ram or vram, but you might want to lower the batch size
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-m3", embed_batch_size=2)

@bibekyess
Copy link
Author

bibekyess commented Jul 30, 2024

@logan-markewich
Thank you for your response.

  1. Yes, the replication code I sent already considers the qdrant config for on-disk storage, but it is strange that the memory is increasing.
  2. I mean RAM, I think embed_batch_size will affect the vram and I have no issue with vram usage as after creating index, vram usage also decreases. But using the above code, RAM usage doesn't decrease even if all the indexes are all created. I need to kill the entire process to free the memory. I have attached the picture. Left one is qdrant docker statistics and right one is RAM memory usage
    image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized
Projects
None yet
Development

No branches or pull requests

2 participants