[ENH] Added batch_size as parameter to SentenceTransformerEmbeddingFu… #2759

cowolff · 2024-09-03T20:48:40Z

…nction class

Description of changes

Added the hyperparameter batch_size to the SentenceTransformerEmbeddingFunction class in order to have more control over memory requirements when it comes to deploying large sentence transformers.

Test plan

Tests pass locally with pytest for python. Local installation and usage also worked.

Documentation Changes

Added docstring for the new hyperparameter

…nction class ## Description of changes Added the hyperparameter batch_size to the SentenceTransformerEmbeddingFunction class in order to have more control over memory requirements when it comes to deploying large sentence transformer. ## Test plan - [x] Tests pass locally with `pytest` for python. Local installation and usage also worked. ## Documentation Changes Added docstring for the new hyperparameter

github-actions · 2024-09-03T20:48:51Z

tazarov

LGTM with a minor nit.

@cowolff Thanks for this. Reading on the effects of batch_size (huggingface/transformers#2401), I feel this is a somewhat contentious topic, but having explicit is better DX (also Zen of Python)

tazarov · 2024-09-04T12:29:16Z

chromadb/utils/embedding_functions/sentence_transformer_embedding_function.py

@@ -17,6 +17,7 @@ def __init__(
        model_name: str = "all-MiniLM-L6-v2",
        device: str = "cpu",
        normalize_embeddings: bool = False,
+        batch_size: int = 32,


Can we make this Optional[int] = 32

Also, technically, this is already supported by the kwargs (not that we pass them to the encode method), but perhaps making it explicit is arguably a better DX.

jeffchuber · 2024-09-16T00:26:36Z

Our underlying impl has changed and so this PR is not landable as is.

That being said - we'd still like to add this functionality and that is now tracked in this issue.

tazarov approved these changes Sep 4, 2024

View reviewed changes

jeffchuber mentioned this pull request Sep 16, 2024

EF cleanup #2800

Open

jeffchuber closed this Sep 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Added batch_size as parameter to SentenceTransformerEmbeddingFu… #2759

[ENH] Added batch_size as parameter to SentenceTransformerEmbeddingFu… #2759

cowolff commented Sep 3, 2024

github-actions bot commented Sep 3, 2024

tazarov left a comment

tazarov Sep 4, 2024

tazarov Sep 4, 2024

jeffchuber commented Sep 16, 2024

[ENH] Added batch_size as parameter to SentenceTransformerEmbeddingFu… #2759

[ENH] Added batch_size as parameter to SentenceTransformerEmbeddingFu… #2759

Conversation

cowolff commented Sep 3, 2024

Description of changes

Test plan

Documentation Changes

github-actions bot commented Sep 3, 2024

Reviewer Checklist

Testing, Bugs, Errors, Logs, Documentation

System Compatibility

Quality

tazarov left a comment

Choose a reason for hiding this comment

tazarov Sep 4, 2024

Choose a reason for hiding this comment

tazarov Sep 4, 2024

Choose a reason for hiding this comment

jeffchuber commented Sep 16, 2024