Skip to content
This repository has been archived by the owner on Mar 16, 2024. It is now read-only.

Finish chroma onboard #148

Merged
merged 4 commits into from
Jul 7, 2023
Merged

Finish chroma onboard #148

merged 4 commits into from
Jul 7, 2023

Conversation

emrgnt-cmplxty
Copy link
Owner

No description provided.

Copy link
Collaborator

@bhavitsharma bhavitsharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, taking a deeper look now

for entry in entries:
documents.append(entry.document)
metadatas.append(entry.metadata)
ids.append(self.entry_to_key(entry))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these IDs generated?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rn for symbols this returns symbol.dotpath. This unpacks the complicated symbol into a simple string like 'MyPath.MyClass.my_method'

]


class JSONSymbolEmbeddingVectorDatabase(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this still be used somewhere?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to have the option of using a simple JSON database for now, but we can remove at some point if we are not using it.

Copy link
Collaborator

@bhavitsharma bhavitsharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyways, can you confirm if the chroma database will be persisted to disk or something by default? I think if chroma is running via a docker-compose image then it probably will (they provide that option). I think by default it uses the in-memory DB, probably worth checking

automata/symbol_embedding/vector_databases.py Show resolved Hide resolved
@emrgnt-cmplxty
Copy link
Owner Author

anyways, can you confirm if the chroma database will be persisted to disk or something by default? I think if chroma is running via a docker-compose image then it probably will (they provide that option). I think by default it uses the in-memory DB, probably worth checking

The constructor allows either in-mem or persistent creation. See this snippet -

class ChromaVectorDatabase(VectorDatabaseProvider, Generic[K, V]):
    """Concrete class to provide a vector database that uses Chroma."""

    def __init__(self, collection_name: str, persist_directory: Optional[str] = None):
        self._setup_chroma_client(persist_directory)
        self._collection = self.client.get_or_create_collection(collection_name)

    def _setup_chroma_client(self, persist_directory: Optional[str] = None):
        """Setup the Chroma client, here we attempt to contain the Chroma dependency."""
        try:
            import chromadb
            from chromadb.config import Settings
        except ImportError as e:
            raise ImportError(
                "Please install Chroma Python client first: " "`pip install chromadb`"
            ) from e
        if persist_directory:
            self.client = chromadb.Client(
                Settings(
                    chroma_db_impl="duckdb+parquet",
                    persist_directory=persist_directory,
                )
            )
        else:
            # A single instance client which terminates at session end
            self.client = chromadb.Client()

@emrgnt-cmplxty emrgnt-cmplxty merged commit d15d51c into main Jul 7, 2023
6 checks passed
documents.append(entry.document)
metadatas.append(entry.metadata)
ids.append(self.entry_to_key(entry))
embeddings.append([int(ele) for ele in entry.vector])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is good that we're passing the embeddings to chroma.
We need to keep in mind not to exceed Openai's rate limits while creating embeddings. (not related to this PR since you're passing embeddings to chroma directly)

@emrgnt-cmplxty emrgnt-cmplxty deleted the feature/finish-chroma-onboard branch July 8, 2023 01:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants