Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Missing of trust_remote_code in SentenceTransformerEmbeddingFunction class, which produces an error during loading of other repo models #2685

Closed
Darrshan-Sankar opened this issue Aug 20, 2024 · 10 comments
Labels
bug Something isn't working

Comments

@Darrshan-Sankar
Copy link

Darrshan-Sankar commented Aug 20, 2024

What happened?

I tried to load an embedding model and create an embedding function using the SentenceTransformerEmbeddingFunction, which I imported using the code:

from chromadb.utils.embedding_functions import SentenceTransformerEmbeddingFunction

I used the same to load a model, which was from a different repository rather than sentence-transformers one. When I tried to perform the same, I got the following error:

Traceback (most recent call last):
  # _"Some parts that i wish to hide"_
    sentence_transformer_ef = SentenceTransformerEmbeddingFunction(model_name = "dunzhang/stella_en_400M_v5", device = "cuda", normalize_embeddings = True, trust_remote_code=True)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: SentenceTransformerEmbeddingFunction.__init__() got an unexpected keyword argument 'trust_remote_code'

I know the solution and wish to pull and merge.

Versions

chromadb v0.3.29, Python v3.11.4

Relevant log output

No response

##NOTE: I wish to commit to the particular branch that corresponds to this version(chromadb==0.3.29) as i have this version as a dependency being used.

I found that this issue was fixed in the following release: Release: 0.5.0https://github.com/chroma-core/chroma/releases/tag/0.5.0

I wish to perform the same update on release for version 0.3.29(as it is part of my dependency). Is that valid and can be done?

@Darrshan-Sankar Darrshan-Sankar added the bug Something isn't working label Aug 20, 2024
@tazarov
Copy link
Contributor

tazarov commented Aug 20, 2024

@Darrshan-Sankar, thanks for reporting this. Indeed, newer versions of Chroma address many bugs or deficiencies of older versions. 0.3.29 is a quite old version for which the Chroma team will not backport any fixes and improvements. So, let's figure out whether there's a way for you to update your dependencies.

Let me know what hurdles you have to overcome to upgrade to the latest Chroma version. I'd be happy to assist further.

@Darrshan-Sankar
Copy link
Author

@Darrshan-Sankar, thanks for reporting this. Indeed, newer versions of Chroma address many bugs or deficiencies of older versions. 0.3.29 is a quite old version for which the Chroma team will not backport any fixes and improvements. So, let's figure out whether there's a way for you to update your dependencies.

Let me know what hurdles you have to overcome to upgrade to the latest Chroma version. I'd be happy to assist further.

Even if I perform a general install, since few langchain based requirements were coded for older package versions, I am getting the older version of chromadb installed. Is it ok if I open a PR for the branch 0.3.29 after the fix?

@tazarov
Copy link
Contributor

tazarov commented Aug 20, 2024

@Darrshan-Sankar, I see your conundrum. The easiest way to move forward without doing the upgrade would be to just pickup the EF from latest main https://github.com/chroma-core/chroma/blob/main/chromadb/utils/embedding_functions/sentence_transformer_embedding_function.py. And adjust it so that it works with the older EmbeddingFunction protocol:

class EmbeddingFunction(Protocol):
    def __call__(self, texts: Documents) -> Embeddings:
        ...

@Darrshan-Sankar
Copy link
Author

@Darrshan-Sankar, I see your conundrum. The easiest way to move forward without doing the upgrade would be to just pickup the EF from latest main https://github.com/chroma-core/chroma/blob/main/chromadb/utils/embedding_functions/sentence_transformer_embedding_function.py. And adjust it so that it works with the older EmbeddingFunction protocol:

class EmbeddingFunction(Protocol):
    def __call__(self, texts: Documents) -> Embeddings:
        ...

@tazarov Thank you for being in touch and I will try and let all know

@Darrshan-Sankar
Copy link
Author

@tazarov Good day. Its @Darrshan-Sankar here. For this issue, I have forked the particular branch and created a Pull request: https://github.com/chroma-core/chroma/pull/2694. It also requires installation of a dependency xformers. Please check and review, as the implementation worked for me.

I verified the same by installing from my updated repository: pip install git+https://github.com/Darrshan-Sankar/[email protected]

Last but not least, I convey my sincere thanks to the maintainers of the project for making this project open-source. This helps everyone in a great way

@tazarov
Copy link
Contributor

tazarov commented Aug 28, 2024

@Darrshan-Sankar, thanks for PR #2694, but I think the chances of it getting merged + a new release in the 0.3.x range is unlikely. I see two paths forward:

  • Incorporate the changes in your project as they are in the PR and use them
  • Break-out embedding functions in a contrib package that doesn't have to follow Chroma regular releases and gives more flexibility. I think there is an issue regarding that - [Update][Ease of Use] Community / Contrib Package #2291

@Darrshan-Sankar
Copy link
Author

@tazarov Thanks to your continuous support. Being new to contributing, I couldn't understand the complete message in the mentioned issue. Help or directions would be appreciated regarding the contrib package. Thank you

@tazarov
Copy link
Contributor

tazarov commented Aug 30, 2024

hey @Darrshan-Sankar, contributing to open source is awesome!

By linking the issue, I mean that there's this high-level idea of a contrib package that enhances Chroma by providing auxiliary functionality. The shape or form of such a contrib package is not yet formalized, so for now, so this doesn't block you, I would suggest that you include the modified embedding function as part of your project and use it as such until it can be contributed meaningfully.

@Darrshan-Sankar
Copy link
Author

Thank you @tazarov . Please notify is such integrations are done by referencing this issue. Thanks

@Darrshan-Sankar
Copy link
Author

@tazarov Good day. Its @Darrshan-Sankar here. For this issue, I have forked the particular branch and created a Pull request: https://github.com/chroma-core/chroma/pull/2694. It also requires installation of a dependency xformers. Please check and review, as the implementation worked for me.

I verified the same by installing from my updated repository: pip install git+https://github.com/Darrshan-Sankar/[email protected]

Last but not least, I convey my sincere thanks to the maintainers of the project for making this project open-source. This helps everyone in a great way

With this fix, I close this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants