-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multimodal Embedding Microservice #555
Conversation
Signed-off-by: Tiep Le <[email protected]>
for more information, see https://pre-commit.ci
Multimodal embedding
comps/embeddings/multimodal_embeddings/multimodal_langchain/local_mm_embedding.py
Outdated
Show resolved
Hide resolved
comps/embeddings/multimodal_embeddings/multimodal_langchain/local_mm_embedding.py
Outdated
Show resolved
Hide resolved
comps/embeddings/multimodal_embeddings/multimodal_langchain/requirements.txt
Outdated
Show resolved
Hide resolved
Codecov ReportAll modified and coverable lines are covered by tests ✅
|
Signed-off-by: Tiep Le <[email protected]>
Hi @lvliang-intel. Thank you for your feedback about langsmith. We have removed langsmith from all places that you pointed out above. Would you please re-review this? |
comps/embeddings/multimodal_embeddings/bridgetower/bridgetower_embedding.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Tiep Le <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recommend to change the name of this folder. This title is to big to cover the multimodal functionality. Especially the use case for this code a pretty small domain where a text must paired with a image to obtain the embedding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your feedback. We would like to discuss to keep the name of this folder as it is because of the followings:
- Currently, it supports not only image-text pairs, but also support text as well. (c.f. please see method embed_query and embed_documents which inherit from interface Embedding from langchain
- This BridgeTowerEmbedding can be extended to embed image (and embed other modalities) in future as well. In our current implementation, we haven't provided such methods because this will not be employed for our proposed application
Multimodal RAG on Videos
. - Although BridgeTowerEmbedding provides implemented methods for embed_documents, embed_query, embed_image_text_pairs, but it can also be considered as an interface for MultimodalEmbedding for other modalities for future development.
We would appreciate if you can take into account these comments and consider again. Please let us know if any of these does not make sense and you insist in changing the folder name?
Thanks a lot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as this PR is merged into 575, please close this PR
This PR #555 is merged to the PR #575. Closing this one. |
Co-authored-by: chen, suyue <[email protected]>
Description
This PR introduces multimodal embedding microservice using BridgeTower model as embedding model. This microservice is required for Multimodal RAG on Videos application
Issues
RFC: https://github.com/opea-project/docs/pull/49/files
Issue: opea-project/GenAIExamples#358
Type of change
List the type of change like below. Please delete options that are not relevant.
Dependencies
docarray[full]
fastapi
huggingface_hub
langchain
langsmith
opentelemetry-api
opentelemetry-exporter-otlp
opentelemetry-sdk
prometheus-fastapi-instrumentator
transformers
shortuuid
uvicorn
torch
torchvision
pydantic==2.8.2
BridgeTower
Tests
We have provided 2 tests for this microservice: