RFC for MM-RAG #49

tileintel · 2024-07-18T23:46:13Z

We submit our RFC for Multimodal-RAG based Visual QnA.
@ftian1 @hshen14: Could you please help to provide feedbacks?

Signed-off-by: Tiep Le <[email protected]>

hshen14 · 2024-07-23T05:14:54Z

community/rfcs/MM-RAG-RFG.md

+
+The proposed architecture involves the creation of two megaservices. 
+- The first megaservice functions as the core pipeline, comprising four microservices: embedding, retriever, reranking, and LVLM. This megaservice exposes a MMRagBasedVisualQnAGateway, allowing users to query the system via the `/v1/mmrag_visual_qna` endpoint.
+- The second megaservice manages user data storage in VectorStore and is composed of a single microservice, embedding. This megaservice provides a MMRagDataIngestionGateway, enabling user access through the `/v1/mmrag_data_ingestion` endpoint.


Can this be an enhanced microservice based on the existing data ingestion?

@hshen14 Thanks for your comments. The simple answer is yes. We will enhance/reuse the existing data ingestion microservice. Current data ingestion microservice supports only TextDoc. In our proposal, we will enhance its interface to accept MultimodalDoc which can be either TextDoc, ImageDoc, ImageTextPairDoc, etc... If the input is of type TextDoc, we will divert the execution to current microservice/functions.

hshen14 · 2024-07-23T05:19:20Z

community/rfcs/MM-RAG-RFG.md

+The proposed architecture involves the creation of two megaservices. 
+- The first megaservice functions as the core pipeline, comprising four microservices: embedding, retriever, reranking, and LVLM. This megaservice exposes a MMRagBasedVisualQnAGateway, allowing users to query the system via the `/v1/mmrag_visual_qna` endpoint.
+- The second megaservice manages user data storage in VectorStore and is composed of a single microservice, embedding. This megaservice provides a MMRagDataIngestionGateway, enabling user access through the `/v1/mmrag_data_ingestion` endpoint.
+- The third megaservice functions as a helper to extract list of frame-transcript pairs from videos using audio-to-text models (e.g., BLIP2) for transcripting or LVLM model (e.g., LLAVA) for captioning. This megaservice is composed of 2 microservices: transcripting and LVLM. This megaservice provides a MMRagVideoprepGateway, enabling user access through the `/v1/mmrag_video_prep` endpoint.


You mentioned either audio-to-text models for transcripting or LVLM model for captioning, while you also mentioned they need to be composed. If composing is not mandatory, does it make sense to make it as microservice?

@hshen14 Thanks for your comment. We were proposing each transcripting and LVLM being a microservice. The composition is not mandatory. We believe that when we ingest a video, it is better to include both frames' transcripts and frames' captions as metadata for inference (LVLM) after retrieval. However, in the megaservice we will have different options for user to choose whether they want transcript only, caption only or both. The composition here is optional. Hope this is clear to you.

ftian1 · 2024-07-23T05:45:14Z

community/rfcs/MM-RAG-RFG.md

+#### 2.1 Embeddings
+- Interface `MultimodalEmbeddings` that extends the interface langchain_core.embeddings.Embeddings with an abstract method: 
+```python 
+embed_multimodal_document(self, doc: MultimodalDoc) -> List[float]


The overall RFC is good to me. just minor comment here. I think this interface is implementation specific one, right?. my point here is you don't need to mention that. you just need tell user what's the standard input and output just like you did in Data Classes section, that's enough.

lvliang-intel · 2024-07-24T02:27:31Z

community/rfcs/MM-RAG-RFG.md

The RFC file naming convention follows this rule: yy-mm-dd-[OPEA Project Name]-[index]-title.md

For example, 24-04-29-GenAIExamples-001-Using_MicroService_to_implement_ChatQnA.md

tileintel added 2 commits July 18, 2024 16:51

RFC for MM-RAG

093d5de

Signed-off-by: Tiep Le <[email protected]>

add sign-off-by message

01597aa

Signed-off-by: Tiep Le <[email protected]>

tileintel force-pushed the main branch from 0d368af to 01597aa Compare July 18, 2024 23:51

tileintel mentioned this pull request Jul 23, 2024

Multimodality opea-project/GenAIExamples#358

Closed

hshen14 requested review from ftian1 and lvliang-intel July 23, 2024 05:12

hshen14 reviewed Jul 23, 2024

View reviewed changes

ftian1 reviewed Jul 23, 2024

View reviewed changes

lvliang-intel reviewed Jul 24, 2024

View reviewed changes

ftian1 mentioned this pull request Aug 5, 2024

RFC: RAG based Multi-Modal Q&A Application for the Medical Domain #31

Closed

This was referenced Aug 15, 2024

Add video-llama LVM microservice under lvms opea-project/GenAIComps#494

Closed

Add video-llama LVM microservice under lvms opea-project/GenAIComps#495

Merged

Add local Rerank microservice for VideoRAGQnA opea-project/GenAIComps#496

Merged

This was referenced Aug 20, 2024

adding embedding support for CLIP based models for VideoRAGQnA opea-project/GenAIComps#533

Closed

adding embedding support for CLIP based models for VideoRAGQnA example for v0.9 opea-project/GenAIComps#538

Merged

srinarayan-srikanthan mentioned this pull request Sep 5, 2024

adding dataprep support for CLIP based models for VideoRAGQnA example for v1.0 opea-project/GenAIComps#621

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC for MM-RAG #49

RFC for MM-RAG #49

tileintel commented Jul 18, 2024

hshen14 Jul 23, 2024

tileintel Jul 23, 2024 •

edited

Loading

hshen14 Jul 23, 2024

tileintel Jul 23, 2024

ftian1 Jul 23, 2024

lvliang-intel Jul 24, 2024

RFC for MM-RAG #49

Are you sure you want to change the base?

RFC for MM-RAG #49

Conversation

tileintel commented Jul 18, 2024

hshen14 Jul 23, 2024

Choose a reason for hiding this comment

tileintel Jul 23, 2024 • edited Loading

Choose a reason for hiding this comment

hshen14 Jul 23, 2024

Choose a reason for hiding this comment

tileintel Jul 23, 2024

Choose a reason for hiding this comment

ftian1 Jul 23, 2024

Choose a reason for hiding this comment

lvliang-intel Jul 24, 2024

Choose a reason for hiding this comment

tileintel Jul 23, 2024 •

edited

Loading