Finish chroma onboard #148

emrgnt-cmplxty · 2023-07-07T17:22:14Z

No description provided.

bhavitsharma

lgtm, taking a deeper look now

bhavitsharma · 2023-07-07T18:07:40Z

automata/symbol_embedding/vector_databases.py

+        for entry in entries:
+            documents.append(entry.document)
+            metadatas.append(entry.metadata)
+            ids.append(self.entry_to_key(entry))


How are these IDs generated?

rn for symbols this returns symbol.dotpath. This unpacks the complicated symbol into a simple string like 'MyPath.MyClass.my_method'

bhavitsharma · 2023-07-07T18:08:31Z

automata/symbol_embedding/vector_databases.py

+        ]
+
+
+class JSONSymbolEmbeddingVectorDatabase(


Will this still be used somewhere?

I think it's good to have the option of using a simple JSON database for now, but we can remove at some point if we are not using it.

bhavitsharma

anyways, can you confirm if the chroma database will be persisted to disk or something by default? I think if chroma is running via a docker-compose image then it probably will (they provide that option). I think by default it uses the in-memory DB, probably worth checking

automata/symbol_embedding/vector_databases.py

emrgnt-cmplxty · 2023-07-07T18:17:35Z

anyways, can you confirm if the chroma database will be persisted to disk or something by default? I think if chroma is running via a docker-compose image then it probably will (they provide that option). I think by default it uses the in-memory DB, probably worth checking

The constructor allows either in-mem or persistent creation. See this snippet -

class ChromaVectorDatabase(VectorDatabaseProvider, Generic[K, V]):
    """Concrete class to provide a vector database that uses Chroma."""

    def __init__(self, collection_name: str, persist_directory: Optional[str] = None):
        self._setup_chroma_client(persist_directory)
        self._collection = self.client.get_or_create_collection(collection_name)

    def _setup_chroma_client(self, persist_directory: Optional[str] = None):
        """Setup the Chroma client, here we attempt to contain the Chroma dependency."""
        try:
            import chromadb
            from chromadb.config import Settings
        except ImportError as e:
            raise ImportError(
                "Please install Chroma Python client first: " "`pip install chromadb`"
            ) from e
        if persist_directory:
            self.client = chromadb.Client(
                Settings(
                    chroma_db_impl="duckdb+parquet",
                    persist_directory=persist_directory,
                )
            )
        else:
            # A single instance client which terminates at session end
            self.client = chromadb.Client()

bhavitsharma · 2023-07-07T18:24:05Z

automata/symbol_embedding/vector_databases.py

+            documents.append(entry.document)
+            metadatas.append(entry.metadata)
+            ids.append(self.entry_to_key(entry))
+            embeddings.append([int(ele) for ele in entry.vector])


this is good that we're passing the embeddings to chroma.
We need to keep in mind not to exceed Openai's rate limits while creating embeddings. (not related to this PR since you're passing embeddings to chroma directly)

Finish chroma onboard

8f71b56

bhavitsharma approved these changes Jul 7, 2023

View reviewed changes

Cleanup chroma impl

a28e0b1

bhavitsharma approved these changes Jul 7, 2023

View reviewed changes

automata/symbol_embedding/vector_databases.py Show resolved Hide resolved

cleanup type error

78f105e

emrgnt-cmplxty mentioned this pull request Jul 7, 2023

Add support for Chroma #138

Closed

fix comment

23b35df

emrgnt-cmplxty merged commit d15d51c into main Jul 7, 2023
6 checks passed

bhavitsharma approved these changes Jul 7, 2023

View reviewed changes

emrgnt-cmplxty deleted the feature/finish-chroma-onboard branch July 8, 2023 01:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finish chroma onboard #148

Finish chroma onboard #148

emrgnt-cmplxty commented Jul 7, 2023

bhavitsharma left a comment

bhavitsharma Jul 7, 2023

emrgnt-cmplxty Jul 7, 2023

bhavitsharma Jul 7, 2023

emrgnt-cmplxty Jul 7, 2023

bhavitsharma left a comment

emrgnt-cmplxty commented Jul 7, 2023

bhavitsharma Jul 7, 2023

Finish chroma onboard #148

Finish chroma onboard #148

Conversation

emrgnt-cmplxty commented Jul 7, 2023

bhavitsharma left a comment

Choose a reason for hiding this comment

bhavitsharma Jul 7, 2023

Choose a reason for hiding this comment

emrgnt-cmplxty Jul 7, 2023

Choose a reason for hiding this comment

bhavitsharma Jul 7, 2023

Choose a reason for hiding this comment

emrgnt-cmplxty Jul 7, 2023

Choose a reason for hiding this comment

bhavitsharma left a comment

Choose a reason for hiding this comment

emrgnt-cmplxty commented Jul 7, 2023

bhavitsharma Jul 7, 2023

Choose a reason for hiding this comment