-
Notifications
You must be signed in to change notification settings - Fork 100
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, taking a deeper look now
for entry in entries: | ||
documents.append(entry.document) | ||
metadatas.append(entry.metadata) | ||
ids.append(self.entry_to_key(entry)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are these IDs generated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rn for symbols this returns symbol.dotpath. This unpacks the complicated symbol into a simple string like 'MyPath.MyClass.my_method'
] | ||
|
||
|
||
class JSONSymbolEmbeddingVectorDatabase( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this still be used somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good to have the option of using a simple JSON database for now, but we can remove at some point if we are not using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anyways, can you confirm if the chroma database will be persisted to disk or something by default? I think if chroma is running via a docker-compose image then it probably will (they provide that option). I think by default it uses the in-memory DB, probably worth checking
The constructor allows either in-mem or persistent creation. See this snippet - class ChromaVectorDatabase(VectorDatabaseProvider, Generic[K, V]):
"""Concrete class to provide a vector database that uses Chroma."""
def __init__(self, collection_name: str, persist_directory: Optional[str] = None):
self._setup_chroma_client(persist_directory)
self._collection = self.client.get_or_create_collection(collection_name)
def _setup_chroma_client(self, persist_directory: Optional[str] = None):
"""Setup the Chroma client, here we attempt to contain the Chroma dependency."""
try:
import chromadb
from chromadb.config import Settings
except ImportError as e:
raise ImportError(
"Please install Chroma Python client first: " "`pip install chromadb`"
) from e
if persist_directory:
self.client = chromadb.Client(
Settings(
chroma_db_impl="duckdb+parquet",
persist_directory=persist_directory,
)
)
else:
# A single instance client which terminates at session end
self.client = chromadb.Client() |
documents.append(entry.document) | ||
metadatas.append(entry.metadata) | ||
ids.append(self.entry_to_key(entry)) | ||
embeddings.append([int(ele) for ele in entry.vector]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is good that we're passing the embeddings to chroma.
We need to keep in mind not to exceed Openai's rate limits while creating embeddings. (not related to this PR since you're passing embeddings to chroma directly)
No description provided.