diff --git a/VideoRAGQnA/README.md b/VideoRAGQnA/README.md new file mode 100644 index 000000000..54e806243 --- /dev/null +++ b/VideoRAGQnA/README.md @@ -0,0 +1,104 @@ +# Video RAG + +## Introduction + +Video RAG is a framework that retrieves video based on provided user prompt. It uses both video scene description generated by open source vision models (ex video-llama, video-llava etc.) as text embeddings and frames as image embeddings to perform vector similarity search. The provided solution also supports feature to retrieve more similar videos without prompting it. (see the example video below) + +![Example Video](docs/visual-rag-demo.gif) + +## Tools + +- **UI**: gradio **or** streamlit +- **Vector Storage**: Chroma DB **or** Intel's VDMS +- **Image Embeddings**: CLIP +- **Text Embeddings**: all-MiniLM-L12-v2 +- **RAG Retriever**: Langchain Ensemble Retrieval + +## Prerequisites + +There are 10 example videos present in `video_ingest/videos` along with their description generated by open-source vision model. +If you want these video RAG to work on your own videos, make sure it matches below format. + +## File Structure + +```bash +video_ingest/ +. +├── scene_description +│ ├── op_10_0320241830.mp4.txt +│ ├── op_1_0320241830.mp4.txt +│ ├── op_19_0320241830.mp4.txt +│ ├── op_21_0320241830.mp4.txt +│ ├── op_24_0320241830.mp4.txt +│ ├── op_31_0320241830.mp4.txt +│ ├── op_47_0320241830.mp4.txt +│ ├── op_5_0320241915.mp4.txt +│ ├── op_DSCF2862_Rendered_001.mp4.txt +│ └── op_DSCF2864_Rendered_006.mp4.txt +└── videos + ├── op_10_0320241830.mp4 + ├── op_1_0320241830.mp4 + ├── op_19_0320241830.mp4 + ├── op_21_0320241830.mp4 + ├── op_24_0320241830.mp4 + ├── op_31_0320241830.mp4 + ├── op_47_0320241830.mp4 + ├── op_5_0320241915.mp4 + ├── op_DSCF2862_Rendered_001.mp4 + └── op_DSCF2864_Rendered_006.mp4 +``` + +## Setup and Installation + +Install pip requirements + +```bash +cd VideoRAGQnA +pip3 install -r docs/requirements.txt +``` + +The current framework supports both Chroma DB and Intel's VDMS, use either of them, + +Running Chroma DB as docker container + +```bash +docker run -d -p 8000:8000 chromadb/chroma +``` + +**or** + +Running VDMS DB as docker container + +```bash +docker run -d -p 55555:55555 intellabs/vdms:latest +``` + +**Note:** If you are not using file structure similar to what is described above, consider changing it in `config.yaml`. + +Update your choice of db and port in `config.yaml`. + +```bash +export VECTORDB_SERVICE_HOST_IP= + +export HUGGINGFACEHUB_API_TOKEN='' +``` + +HuggingFace hub API token can be generated [here](https://huggingface.co/login?next=%2Fsettings%2Ftokens). + +Generating Image embeddings and store them into selected db, specify config file location and video input location + +```bash +python3 embedding/generate_store_embeddings.py docs/config.yaml video_ingest/videos/ +``` + +**Web UI Video RAG - Streamlit** + +```bash +streamlit run video-rag-ui.py --server.address 0.0.0.0 --server.port 50055 +``` + +**Web UI Video RAG - Gradio** + +```bash +python3 video-rag-ui.py docs/config.yaml True '0.0.0.0' 50055 +``` diff --git a/VideoRAGQnA/__init__.py b/VideoRAGQnA/__init__.py new file mode 100644 index 000000000..916f3a44b --- /dev/null +++ b/VideoRAGQnA/__init__.py @@ -0,0 +1,2 @@ +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 diff --git a/VideoRAGQnA/docs/config.yaml b/VideoRAGQnA/docs/config.yaml new file mode 100755 index 000000000..b3e0844f7 --- /dev/null +++ b/VideoRAGQnA/docs/config.yaml @@ -0,0 +1,26 @@ +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +# Path to all videos +videos: video_ingest/videos/ +# Path to video description generated by open-source vision models (ex. video-llama, video-llava, etc.) +description: video_ingest/scene_description/ +# Do you want to extract frames of videos (True if not done already, else False) +generate_frames: True +# Do you want to generate image embeddings? +embed_frames: True +# Path to store extracted frames +image_output_dir: video_ingest/frames/ +# Path to store metadata files +meta_output_dir: video_ingest/frame_metadata/ +# Number of frames to extract per second, +# if 24 fps, and this value is 2, then it will extract 12th and 24th frame +number_of_frames_per_second: 2 + +vector_db: + choice_of_db: 'vdms' #'chroma' # #Supported databases [vdms, chroma] + host: 0.0.0.0 + port: 55555 #8000 # + +# LLM path +model_path: meta-llama/Llama-2-7b-chat-hf diff --git a/VideoRAGQnA/docs/requirements.txt b/VideoRAGQnA/docs/requirements.txt new file mode 100644 index 000000000..c61e6abc7 --- /dev/null +++ b/VideoRAGQnA/docs/requirements.txt @@ -0,0 +1,12 @@ +accelerate +chromadb +dateparser +gradio +langchain-experimental +metafunctions +open-clip-torch +opencv-python-headless +sentence-transformers +streamlit +tzlocal +vdms diff --git a/VideoRAGQnA/docs/visual-rag-demo.gif b/VideoRAGQnA/docs/visual-rag-demo.gif new file mode 100644 index 000000000..45bf7a462 Binary files /dev/null and b/VideoRAGQnA/docs/visual-rag-demo.gif differ diff --git a/VideoRAGQnA/embedding/__init__.py b/VideoRAGQnA/embedding/__init__.py new file mode 100644 index 000000000..916f3a44b --- /dev/null +++ b/VideoRAGQnA/embedding/__init__.py @@ -0,0 +1,2 @@ +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 diff --git a/VideoRAGQnA/embedding/extract_store_frames.py b/VideoRAGQnA/embedding/extract_store_frames.py new file mode 100644 index 000000000..0df791033 --- /dev/null +++ b/VideoRAGQnA/embedding/extract_store_frames.py @@ -0,0 +1,120 @@ +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import datetime +import json +import os +import random + +import cv2 +from tzlocal import get_localzone + + +def process_all_videos(path, image_output_dir, meta_output_dir, N, selected_db): + + def extract_frames(video_path, image_output_dir, meta_output_dir, N, date_time, local_timezone): + video = video_path.split("/")[-1] + # Create a directory to store frames and metadata + os.makedirs(image_output_dir, exist_ok=True) + os.makedirs(meta_output_dir, exist_ok=True) + + # Open the video file + cap = cv2.VideoCapture(video_path) + + if int(cv2.__version__.split(".")[0]) < 3: + fps = cap.get(cv2.cv.CV_CAP_PROP_FPS) + else: + fps = cap.get(cv2.CAP_PROP_FPS) + + total_frames = cap.get(cv2.CAP_PROP_FRAME_COUNT) + + # print (f'fps {fps}') + # print (f'total frames {total_frames}') + + mod = int(fps // N) + if mod == 0: + mod = 1 + + print(f"total frames {total_frames}, N {N}, mod {mod}") + + # Variables to track frame count and desired frames + frame_count = 0 + + # Metadata dictionary to store timestamp and image paths + metadata = {} + + while cap.isOpened(): + ret, frame = cap.read() + + if not ret: + break + + frame_count += 1 + + if frame_count % mod == 0: + timestamp = cap.get(cv2.CAP_PROP_POS_MSEC) / 1000 # Convert milliseconds to seconds + frame_path = os.path.join(image_output_dir, f"{video}_{frame_count}.jpg") + time = date_time.strftime("%H:%M:%S") + date = date_time.strftime("%Y-%m-%d") + hours, minutes, seconds = map(float, time.split(":")) + year, month, day = map(int, date.split("-")) + + cv2.imwrite(frame_path, frame) # Save the frame as an image + + metadata[frame_count] = { + "timestamp": timestamp, + "frame_path": frame_path, + "date": date, + "year": year, + "month": month, + "day": day, + "time": time, + "hours": hours, + "minutes": minutes, + "seconds": seconds, + } + if selected_db == "vdms": + # Localize the current time to the local timezone of the machine + # Tahani might not need this + current_time_local = date_time.replace(tzinfo=datetime.timezone.utc).astimezone(local_timezone) + + # Convert the localized time to ISO 8601 format with timezone offset + iso_date_time = current_time_local.isoformat() + metadata[frame_count]["date_time"] = {"_date": str(iso_date_time)} + + # Save metadata to a JSON file + metadata_file = os.path.join(meta_output_dir, f"{video}_metadata.json") + with open(metadata_file, "w") as f: + json.dump(metadata, f, indent=4) + + # Release the video capture and close all windows + cap.release() + print(f"{frame_count/mod} Frames extracted and metadata saved successfully.") + return fps, total_frames, metadata_file + + videos = [file for file in os.listdir(path) if file.endswith(".mp4")] + + # print (f'Total {len(videos)} videos will be processed') + metadata = {} + + for i, each_video in enumerate(videos): + video_path = os.path.join(path, each_video) + date_time = datetime.datetime.now() + print("date_time : ", date_time) + # Get the local timezone of the machine + local_timezone = get_localzone() + fps, total_frames, metadata_file = extract_frames( + video_path, image_output_dir, meta_output_dir, N, date_time, local_timezone + ) + metadata[each_video] = { + "fps": fps, + "total_frames": total_frames, + "extracted_frame_metadata_file": metadata_file, + "embedding_path": f"embeddings/{each_video}.pt", + "video_path": f"{path}/{each_video}", + } + print(f"✅ {i+1}/{len(videos)}") + + metadata_file = os.path.join(meta_output_dir, "metadata.json") + with open(metadata_file, "w") as f: + json.dump(metadata, f, indent=4) diff --git a/VideoRAGQnA/embedding/generate_store_embeddings.py b/VideoRAGQnA/embedding/generate_store_embeddings.py new file mode 100644 index 000000000..bb419d6bf --- /dev/null +++ b/VideoRAGQnA/embedding/generate_store_embeddings.py @@ -0,0 +1,178 @@ +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +# import sys +# import os + +# sys.path.append('/path/to/parent') # Replace with the actual path to the parent folder + +import os + +# from VideoRAGQnA.utils import config_reader as reader +import sys + +# Add the parent directory of the current script to the Python path +sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))) +VECTORDB_SERVICE_HOST_IP = os.getenv("VECTORDB_SERVICE_HOST_IP", "0.0.0.0") + + +import argparse +import json +import os + +import chromadb + +# sys.path.append(os.path.abspath('../utils')) +# import config_reader as reader +import yaml +from extract_store_frames import process_all_videos +from langchain_experimental.open_clip import OpenCLIPEmbeddings +from utils import config_reader as reader +from vector_stores import db + +# EMBEDDING MODEL +clip_embd = OpenCLIPEmbeddings(model_name="ViT-g-14", checkpoint="laion2b_s34b_b88k") + + +def read_json(path): + with open(path) as f: + x = json.load(f) + return x + + +def read_file(path): + content = None + with open(path, "r") as file: + content = file.read() + return content + + +def store_into_vectordb(metadata_file_path, selected_db): + GMetadata = read_json(metadata_file_path) + global_counter = 0 + + total_videos = len(GMetadata.keys()) + + for _, (video, data) in enumerate(GMetadata.items()): + + image_name_list = [] + embedding_list = [] + metadata_list = [] + ids = [] + + # process frames + frame_metadata = read_json(data["extracted_frame_metadata_file"]) + for frame_id, frame_details in frame_metadata.items(): + global_counter += 1 + if selected_db == "vdms": + meta_data = { + "timestamp": frame_details["timestamp"], + "frame_path": frame_details["frame_path"], + "video": video, + "embedding_path": data["embedding_path"], + "date_time": frame_details["date_time"], # {"_date":frame_details['date_time']}, + "date": frame_details["date"], + "year": frame_details["year"], + "month": frame_details["month"], + "day": frame_details["day"], + "time": frame_details["time"], + "hours": frame_details["hours"], + "minutes": frame_details["minutes"], + "seconds": frame_details["seconds"], + } + if selected_db == "chroma": + meta_data = { + "timestamp": frame_details["timestamp"], + "frame_path": frame_details["frame_path"], + "video": video, + "embedding_path": data["embedding_path"], + "date": frame_details["date"], + "year": frame_details["year"], + "month": frame_details["month"], + "day": frame_details["day"], + "time": frame_details["time"], + "hours": frame_details["hours"], + "minutes": frame_details["minutes"], + "seconds": frame_details["seconds"], + } + image_path = frame_details["frame_path"] + image_name_list.append(image_path) + + metadata_list.append(meta_data) + ids.append(str(global_counter)) + # print('datetime',meta_data['date_time']) + # generate clip embeddings + embedding_list.extend(clip_embd.embed_image(image_name_list)) + + vs.add_images(uris=image_name_list, metadatas=metadata_list) + + print( + f"✅ {_+1}/{total_videos} video {video}, len {len(image_name_list)}, {len(metadata_list)}, {len(embedding_list)}" + ) + + +def generate_image_embeddings(selected_db): + if generate_frames: + print("Processing all videos, Generated frames will be stored at") + print(f"input video folder = {path}") + print(f"frames output folder = {image_output_dir}") + print(f"metadata files output folder = {meta_output_dir}") + process_all_videos(path, image_output_dir, meta_output_dir, N, selected_db) + + global_metadata_file_path = meta_output_dir + "metadata.json" + print(f"global metadata file available at {global_metadata_file_path}") + store_into_vectordb(global_metadata_file_path, selected_db) + + +def retrieval_testing(): + Q = "man holding red basket" + print(f"Testing Query {Q}") + results = vs.MultiModalRetrieval(Q) + + ##print (results) + + +if __name__ == "__main__": + # read config yaml + print("Reading config file") + # config = reader.read_config('../docs/config.yaml') + + # Create argument parser + parser = argparse.ArgumentParser(description="Process configuration file for generating and storing embeddings.") + + # Add argument for configuration file + parser.add_argument("config_file", type=str, help="Path to configuration file (e.g., config.yaml)") + + # Add argument for videos folder + parser.add_argument("videos_folder", type=str, help="Path to folder containing videos") + + # Parse command-line arguments + args = parser.parse_args() + + # Read configuration file + config = reader.read_config(args.config_file) + + print("Config file data \n", yaml.dump(config, default_flow_style=False, sort_keys=False)) + + generate_frames = config["generate_frames"] + embed_frames = config["embed_frames"] + path = config["videos"] # args.videos_folder # + image_output_dir = config["image_output_dir"] + meta_output_dir = config["meta_output_dir"] + N = config["number_of_frames_per_second"] + + host = VECTORDB_SERVICE_HOST_IP + port = int(config["vector_db"]["port"]) + selected_db = config["vector_db"]["choice_of_db"] + + # Creating DB + print( + "Creating DB with text and image embedding support, \nIt may take few minutes to download and load all required models if you are running for first time." + ) + print("Connect to {} at {}:{}".format(selected_db, host, port)) + + vs = db.VS(host, port, selected_db) + + generate_image_embeddings(selected_db) + + retrieval_testing() diff --git a/VideoRAGQnA/embedding/vector_stores/db.py b/VideoRAGQnA/embedding/vector_stores/db.py new file mode 100644 index 000000000..19137145c --- /dev/null +++ b/VideoRAGQnA/embedding/vector_stores/db.py @@ -0,0 +1,204 @@ +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import datetime +from typing import Iterable, List, Optional + +import chromadb +from dateparser.search import search_dates +from langchain_community.vectorstores import VDMS, Chroma +from langchain_community.vectorstores.vdms import VDMS_Client +from langchain_core.runnables import ConfigurableField +from langchain_experimental.open_clip import OpenCLIPEmbeddings +from tzlocal import get_localzone + + +class VS: + + def __init__(self, host, port, selected_db): + self.host = host + self.port = port + self.selected_db = selected_db + + # initializing important variables + self.client = None + self.image_db = None + self.image_embedder = OpenCLIPEmbeddings(model_name="ViT-g-14", checkpoint="laion2b_s34b_b88k") + self.image_collection = "image-test" + self.text_retriever = None + self.image_retriever = None + + # initialize_db + self.get_db_client() + self.init_db() + + def get_db_client(self): + + if self.selected_db == "chroma": + print("Connecting to Chroma db server . . .") + self.client = chromadb.HttpClient(host=self.host, port=self.port) + + if self.selected_db == "vdms": + print("Connecting to VDMS db server . . .") + self.client = VDMS_Client(host=self.host, port=self.port) + + def init_db(self): + print("Loading db instances") + if self.selected_db == "chroma": + self.image_db = Chroma( + client=self.client, + embedding_function=self.image_embedder, + collection_name=self.image_collection, + ) + + if self.selected_db == "vdms": + self.image_db = VDMS( + client=self.client, + embedding=self.image_embedder, + collection_name=self.image_collection, + engine="FaissFlat", + ) + + self.image_retriever = self.image_db.as_retriever(search_type="mmr").configurable_fields( + search_kwargs=ConfigurableField( + id="k_image_docs", + name="Search Kwargs", + description="The search kwargs to use", + ) + ) + + def update_db(self, prompt, n_images): + print("Update DB") + + base_date = datetime.datetime.today() + today_date = base_date.date() + dates_found = search_dates(prompt, settings={"PREFER_DATES_FROM": "past", "RELATIVE_BASE": base_date}) + # if no date is detected dates_found should return None + if dates_found != None: + # Print the identified dates + # print("dates_found:",dates_found) + for date_tuple in dates_found: + date_string, parsed_date = date_tuple + print(f"Found date: {date_string} -> Parsed as: {parsed_date}") + date_out = str(parsed_date.date()) + time_out = str(parsed_date.time()) + hours, minutes, seconds = map(float, time_out.split(":")) + year, month, day_out = map(int, date_out.split("-")) + + # print("today's date", base_date) + rounded_seconds = min(round(parsed_date.second + 0.5), 59) + parsed_date = parsed_date.replace(second=rounded_seconds, microsecond=0) + + # Convert the localized time to ISO format + iso_date_time = parsed_date.isoformat() + iso_date_time = str(iso_date_time) + + if self.selected_db == "vdms": + if date_string == "today": + constraints = {"date": ["==", date_out]} + self.update_image_retriever = self.image_db.as_retriever( + search_type="mmr", search_kwargs={"k": n_images, "filter": constraints} + ) + elif date_out != str(today_date) and time_out == "00:00:00": ## exact day (example last friday) + constraints = {"date": ["==", date_out]} + self.update_image_retriever = self.image_db.as_retriever( + search_type="mmr", search_kwargs={"k": n_images, "filter": constraints} + ) + + elif ( + date_out == str(today_date) and time_out == "00:00:00" + ): ## when search_date interprates words as dates output is todays date + time 00:00:00 + self.update_image_retriever = self.image_db.as_retriever( + search_type="mmr", search_kwargs={"k": n_images} + ) + else: ## Interval of time:last 48 hours, last 2 days,.. + constraints = {"date_time": [">=", {"_date": iso_date_time}]} + self.update_image_retriever = self.image_db.as_retriever( + search_type="mmr", search_kwargs={"k": n_images, "filter": constraints} + ) + if self.selected_db == "chroma": + if date_string == "today": + self.update_image_retriever = self.image_db.as_retriever( + search_type="mmr", search_kwargs={"k": n_images, "filter": {"date": {"$eq": date_out}}} + ) + elif date_out != str(today_date) and time_out == "00:00:00": ## exact day (example last friday) + self.update_image_retriever = self.image_db.as_retriever( + search_type="mmr", search_kwargs={"k": n_images, "filter": {"date": {"$eq": date_out}}} + ) + elif ( + date_out == str(today_date) and time_out == "00:00:00" + ): ## when search_date interprates words as dates output is todays date + time 00:00:00 + self.update_image_retriever = self.image_db.as_retriever( + search_type="mmr", search_kwargs={"k": n_images} + ) + else: ## Interval of time:last 48 hours, last 2 days,.. + constraints = {"date_time": [">=", {"_date": iso_date_time}]} + self.update_image_retriever = self.image_db.as_retriever( + search_type="mmr", + search_kwargs={ + "filter": { + "$or": [ + { + "$and": [ + {"date": {"$eq": date_out}}, + { + "$or": [ + {"hours": {"$gte": hours}}, + { + "$and": [ + {"hours": {"$eq": hours}}, + {"minutes": {"$gte": minutes}}, + ] + }, + ] + }, + ] + }, + { + "$or": [ + {"month": {"$gt": month}}, + {"$and": [{"day": {"$gt": day_out}}, {"month": {"$eq": month}}]}, + ] + }, + ] + }, + "k": n_images, + }, + ) + else: + self.update_image_retriever = self.image_db.as_retriever(search_type="mmr", search_kwargs={"k": n_images}) + + def length(self): + if self.selected_db == "chroma": + images = self.image_db.__len__() + return (texts, images) + + if self.selected_db == "vdms": + pass + + return (None, None) + + def delete_collection(self, collection_name): + self.client.delete_collection(collection_name=collection_name) + + def add_images( + self, + uris: List[str], + metadatas: Optional[List[dict]] = None, + ): + + self.image_db.add_images(uris, metadatas) + + def MultiModalRetrieval( + self, + query: str, + n_images: Optional[int] = 3, + ): + + self.update_db(query, n_images) + image_results = self.update_image_retriever.invoke(query) + + for r in image_results: + print("images:", r.metadata["video"], "\t", r.metadata["date"], "\t", r.metadata["time"], "\n") + + return image_results diff --git a/VideoRAGQnA/images.jpeg b/VideoRAGQnA/images.jpeg new file mode 100644 index 000000000..277cac6c3 Binary files /dev/null and b/VideoRAGQnA/images.jpeg differ diff --git a/VideoRAGQnA/images.png b/VideoRAGQnA/images.png new file mode 100644 index 000000000..a856ce395 Binary files /dev/null and b/VideoRAGQnA/images.png differ diff --git a/VideoRAGQnA/prompt_processing/README.md b/VideoRAGQnA/prompt_processing/README.md new file mode 100644 index 000000000..dcf2c804d --- /dev/null +++ b/VideoRAGQnA/prompt_processing/README.md @@ -0,0 +1 @@ +# Placeholder diff --git a/VideoRAGQnA/ui/README.md b/VideoRAGQnA/ui/README.md new file mode 100644 index 000000000..dcf2c804d --- /dev/null +++ b/VideoRAGQnA/ui/README.md @@ -0,0 +1 @@ +# Placeholder diff --git a/VideoRAGQnA/utils/__init__.py b/VideoRAGQnA/utils/__init__.py new file mode 100644 index 000000000..916f3a44b --- /dev/null +++ b/VideoRAGQnA/utils/__init__.py @@ -0,0 +1,2 @@ +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 diff --git a/VideoRAGQnA/utils/config_reader.py b/VideoRAGQnA/utils/config_reader.py new file mode 100644 index 000000000..4a5dfad6b --- /dev/null +++ b/VideoRAGQnA/utils/config_reader.py @@ -0,0 +1,11 @@ +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import yaml + + +def read_config(path): + with open(path, "r") as f: + config = yaml.safe_load(f) + + return config diff --git a/VideoRAGQnA/utils/prompt_handler.py b/VideoRAGQnA/utils/prompt_handler.py new file mode 100644 index 000000000..3c5d556a8 --- /dev/null +++ b/VideoRAGQnA/utils/prompt_handler.py @@ -0,0 +1,18 @@ +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from jinja2 import BaseLoader, Environment + +PROMPT = open("utils/prompt_template.jinja2").read().strip() + + +def get_formatted_prompt(scene, prompt, history): + newline = "\n" + # formatted = f"{newline}User: {history[0]}{newline}Assistant: {history[1]}{newline}" + try: + formatted = "\n[INST]\n".join(history) + except: + formatted = "\n[INST]\n".join(["hello", "hi"]) + env = Environment(loader=BaseLoader()) + template = env.from_string(PROMPT) + return template.render(scene=scene, prompt=prompt, history=formatted) diff --git a/VideoRAGQnA/utils/prompt_template.jinja2 b/VideoRAGQnA/utils/prompt_template.jinja2 new file mode 100644 index 000000000..f5411344c --- /dev/null +++ b/VideoRAGQnA/utils/prompt_template.jinja2 @@ -0,0 +1,24 @@ +<> +You are an Intel assistant who understands visual and textual content. +<> +[INST] +You will be provided with three things: scene description, user's question, and previous chat history for context. You are supposed to understand scene description \ +and provide answers to the user's questions. + +As an assistant, you need to follow these Rules while answering questions, + +Rules: +- Don't answer any questions that are not related to the provided scene description. +- Don't be toxic and don't include harmful information. +- Answer if you can from the provided scene description; otherwise, just say You don't have enough information to answer the question. + +Here is the, +Scene Description: {{ scene }} + +Here is the user's previous chat history: +context: {{ history }} + +The user wants to know, +User: {{ prompt }} +[/INST]\n +Assistant: \ No newline at end of file diff --git a/VideoRAGQnA/video-rag-ui.py b/VideoRAGQnA/video-rag-ui.py new file mode 100644 index 000000000..9249d8df2 --- /dev/null +++ b/VideoRAGQnA/video-rag-ui.py @@ -0,0 +1,341 @@ +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import os +import threading +import time +from typing import Any, List, Mapping, Optional + +import gradio as gr +import torch +from embedding.vector_stores import db +from langchain.llms.base import LLM +from langchain_core.callbacks.manager import CallbackManagerForLLMRun +from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer, set_seed +from utils import config_reader as reader +from utils import prompt_handler as ph + +# from vector_stores import db +# HUGGINGFACEHUB_API_TOKEN = os.getenv("HUGGINGFACEHUB_API_TOKEN", "") +HUGGINGFACEHUB_API_TOKEN = "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" + +set_seed(22) +import argparse + +device = torch.device("cuda" if torch.cuda.is_available() else "cpu") + +VECTORDB_SERVICE_HOST_IP = os.getenv("VECTORDB_SERVICE_HOST_IP", "0.0.0.0") + + +CSS = """ + +.custom_login-btn { + width: 100% !important; + display: block !important; + color: white !important; + background: rgb(0 118 189) !important; + background-fill-secondary: var(--neutral-50); + --border-color-accent: var(--primary-300); + --border-color-primary: var(--neutral-200); + +} +.context_container{ + + border: 1px solid black; + padding: 0px; + +} + +.passwordContainer{ + width: 30% !important; + padding: 20px; + position: absolute; + top: 50%; + left: 50%; + transform: translate(-50%, 50%); + +} + +.video-class { + width: 100%; /* Set the width of the container */ + height: 100%; /* Set the height of the container */ + position: relative; /* Position the video relative to the container */ + overflow: hidden; /* Hide overflow content */ +} + +.video-class video { + width: 100%; /* Set video width to fill container */ + height: 100%; /* Set video height to fill container */ + object-fit: cover; /* Stretch and fill the video to cover the entire container */ +} + + +.custom_submit-btn_2 { + display: block !important; + background: #fed7aa !important; + color: #ea580c !important; + background-fill-secondary: var(--neutral-50); + --border-color-accent: var(--primary-300); + --border-color-primary: var(--neutral-200); + +} + + +.custom_submit-btn{ + background: rgb(240,240,240) !important; + color: rgb(118 118 118) !important; + font-size: 13px; +} +.custom_submit-btn:hover{ + background: rgb(0 118 189) !important; + color: white !important; + font-size: 16px; +} + +.custom_blue-btn { + display: block !important; + background: rgb(0 118 189) !important; + color: white !important; + background-fill-secondary: var(--neutral-50); + --border-color-accent: var(--primary-300); + --border-color-primary: var(--neutral-200); + +} + +.return-btn { + display: block !important; + background: rgb(0 0 0) !important; + color: white !important; + background-fill-secondary: var(--neutral-50); + --border-color-accent: var(--primary-300); + --border-color-primary: var(--neutral-200); +} +video { + + object-fit: cover; /* Stretch and fill the video to cover the entire container */ + + +} + +""" + + +def load_models(): + # print("HF Token: ", HUGGINGFACEHUB_API_TOKEN) + model = AutoModelForCausalLM.from_pretrained( + model_path, torch_dtype=torch.float32, device_map=device, trust_remote_code=True, token=HUGGINGFACEHUB_API_TOKEN + ) + + tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True, token=HUGGINGFACEHUB_API_TOKEN) + tokenizer.padding_size = "right" + streamer = TextIteratorStreamer(tokenizer, skip_prompt=True) + + return model, tokenizer, streamer + + +class CustomLLM(LLM): + + @torch.inference_mode() + def _call( + self, + prompt: str, + stop: Optional[List[str]] = None, + run_manager: Optional[CallbackManagerForLLMRun] = None, + streamer: Optional[TextIteratorStreamer] = None, # Add streamer as an argument + ) -> str: + + tokens = tokenizer.encode(prompt, return_tensors="pt") + input_ids = tokens.to(device) + with torch.no_grad(): + output = model.generate( + input_ids=input_ids, + max_new_tokens=500, + num_return_sequences=1, + num_beams=1, + min_length=1, + top_p=0.9, + top_k=50, + repetition_penalty=1.2, + length_penalty=1, + temperature=0.1, + streamer=streamer, + # pad_token_id=tokenizer.eos_token_id, + do_sample=True, + ) + + def stream_res(self, prompt): + thread = threading.Thread(target=self._call, args=(prompt, None, None, streamer)) # Pass streamer to _call + thread.start() + + for text in streamer: + yield text + + @property + def _identifying_params(self) -> Mapping[str, Any]: + return model_path # {"name_of_model": model_path} + + @property + def _llm_type(self) -> str: + return "custom" + + +def videoSearch(query): + results = vs.MultiModalRetrieval(query) + result = [i.metadata["video"] for i in results] + # result += [i.metadata['frame_path'] for i in results] + result.sort() + result = list(set(result)) + result = [["video_ingest/videos/" + i, i] for i in result] + return result, [[None, "Hello"]] + + +def bot(chatbot, input_message, Gallery, selection): + print(selection, "====") + try: + context = "video_ingest/scene_description/" + Gallery[selection][1] + ".txt" + with open(context) as f: + context = f.read() + except: + context = "No video is selected, tell the client to select a video." + formatted_prompt = ph.get_formatted_prompt(context, input_message, chatbot[-1]) + response = chatbot + [[input_message, ""]] + for new_text in llm.stream_res(formatted_prompt): + response[-1][1] += new_text + + yield response, "" + + +def select_fn(data: gr.SelectData): + print(data.index) + print(data.value) + return data.index, [[None, "Hello"]] + + +scheme = ( + "#FFFFFF", + "#FAFAFA", + "#F5F5F5", + "#F0F0F0", + "#E8E8E8", + "#E0E0E0", + "#D6D6D6", + "#C2C2C2", + "#A8A8A8", + "#8D8D8D", + "#767676", +) + + +def spawnUI(share_gradio, server_port, server_name): + with gr.Blocks( + css=CSS, theme=gr.themes.Base(primary_hue=gr.themes.Color(*scheme), secondary_hue=gr.themes.colors.blue) + ) as demo: + + # chatUI = + with gr.Group(): + gr.Label("# Visual Rag", show_label=False) + examples = gr.Dropdown( + [ #'Enter Text', + # 'Find similar videos', + "Man wearing glasses", + "People reading item description", + "Man holding red shopping basket", + "Was there any person wearing a blue shirt seen today?", + "Was there any person wearing a blue shirt seen in the last 6 hours?", + "Was there any person wearing a blue shirt seen last Sunday?", + "Was a person wearing glasses seen in the last 30 minutes?", + "Was a person wearing glasses seen in the last 72 hours?", + ], + label="Video search", + allow_custom_value=True, + ) + with gr.Row(visible=True, elem_classes=["chatbot-widget"]) as chatUI: + with gr.Column(scale=10): + chatbot = gr.Chatbot( + layout="panel", + bubble_full_width=True, + avatar_images=["images.jpeg", "images.png"], + show_label=False, + container=False, + height=600, + value=[ + [ + None, + "Hello, welcome to Visual Rag! To get started, select a question from the drop-down menu above. \n Then, click on your selection and type your question below to chat with the video", + ] + ], + ) + # with gr.Group(): + with gr.Row(): + message = gr.Textbox(placeholder="Type a message...", show_label=False, scale=9) + submit = gr.Button("Submit", elem_classes=["custom_blue-btn"], scale=1) + # with gr.Row() as return_div: + + # back = gr.Button("↩️ Return", elem_classes=["custom_submit-btn"]) + # retry = gr.Button("🔄 Retry", elem_classes=["custom_submit-btn"]) + # clear = gr.Button("🗑️ Clear", elem_classes=["custom_submit-btn"]) + with gr.Column(scale=5): + # Video = gr.Video(show_label = False, container = False) + Gallery = gr.Gallery(label="Retrieved Videos", interactive=False) + selection = gr.Number(0, visible=False) + Gallery.select(select_fn, None, [selection, chatbot], queue=False) + examples.change(fn=videoSearch, inputs=examples, outputs=[Gallery, chatbot]) + # loginbutton.click(validate, [username, password], [LogInPage, chatUI, error_message], queue=False) + message.submit(bot, [chatbot, message, Gallery, selection], [chatbot, message]) + submit.click(bot, [chatbot, message, Gallery, selection], [chatbot, message]) + demo.queue().launch(share=share_gradio, server_port=server_port, server_name=server_name) + + +if __name__ == "__main__": + print("Reading config file") + # config = reader.read_config('../docs/config.yaml') + + # Create argument parser + parser = argparse.ArgumentParser(description="Process configuration file for generating and storing embeddings.") + + # Add argument for configuration file + parser.add_argument("config_file", type=str, help="Path to configuration file (e.g., config.yaml)") + + parser.add_argument( + "share_gradio", + type=bool, + help="whether to create a publicly shareable link for the gradio app. Creates an SSH tunnel to make your UI accessible from anywhere", + ) + + parser.add_argument( + "server_name", + type=str, + default=None, + help='to make app accessible on local network, set this to "0.0.0.0". Can be set by environment variable GRADIO_SERVER_NAME.', + ) + + parser.add_argument( + "server_port", + type=int, + default=None, + help="will start gradio app on this port (if available). Can be set by environment variable GRADIO_SERVER_PORT. ", + ) + + # Parse command-line arguments + args = parser.parse_args() + + # Read configuration file + config = reader.read_config(args.config_file) + share_gradio = args.share_gradio + server_port = args.server_port + server_name = args.server_name + + model_path = config["model_path"] + video_dir = config["videos"] + print(video_dir) + video_dir = video_dir.replace("../", "") + + model, tokenizer, streamer = load_models() + llm = CustomLLM() + + host = VECTORDB_SERVICE_HOST_IP + port = int(config["vector_db"]["port"]) + selected_db = config["vector_db"]["choice_of_db"] + + vs = db.VS(host, port, selected_db) + spawnUI(share_gradio, server_port, server_name) diff --git a/VideoRAGQnA/video_ingest/scene_description/op_10_0320241830.mp4.txt b/VideoRAGQnA/video_ingest/scene_description/op_10_0320241830.mp4.txt new file mode 100644 index 000000000..11d8d4627 --- /dev/null +++ b/VideoRAGQnA/video_ingest/scene_description/op_10_0320241830.mp4.txt @@ -0,0 +1 @@ + The video shows a man standing behind a hardware store aisle, looking at the items displayed on the shelves. He appears to be browsing through the products available for sale. As he walks down the aisle, another person can be seen walking towards him. The two men seem to be engaged in conversation as they pass each other by. The man behind the aisle is wearing a blue shirt, while the other man is wearing a plaid shirt. They both appear to be dressed casually, which suggests that they might be shopping for home improvement supplies. The man behind the aisle is holding a cell phone in his hand, possibly checking prices or searching for product reviews online. The other man does not seem to have anything in his hands, but he might be carrying a shopping basket or a list of items to purchase. Overall, the scene depicts a typical shopping experience in a hardware store, where customers can find various products to meet their needs. The colors and properties of the items in the store include different types of tools, paint, and building materials, all of which are essential for home improvement projects. diff --git a/VideoRAGQnA/video_ingest/scene_description/op_19_0320241830.mp4.txt b/VideoRAGQnA/video_ingest/scene_description/op_19_0320241830.mp4.txt new file mode 100644 index 000000000..24e6e1498 --- /dev/null +++ b/VideoRAGQnA/video_ingest/scene_description/op_19_0320241830.mp4.txt @@ -0,0 +1 @@ + The video features two men in the hardware store aisle. One man is standing near the center of the aisle, looking at the items on display. He is wearing a blue plaid shirt and seems to be focused on finding something specific. The other man is walking past him, but he doesn't appear to be interested in anything in particular. As he walks by, he glances at some of the items on the shelves, but doesn't stop to examine them closely. The aisle is filled with various items, including tools, paint supplies, and cleaning products. There are several bottles of different sizes and shapes, which are likely used for painting or cleaning purposes. Additionally, there are multiple boxes containing tools and equipment, such as wrenches, hammers, and saws. These items are arranged neatly on the shelves, making it easy for customers to locate and select what they need. Overall, the scene captures a typical shopping experience in a hardware store, where customers can find everything they need for their home improvement projects. diff --git a/VideoRAGQnA/video_ingest/scene_description/op_1_0320241830.mp4.txt b/VideoRAGQnA/video_ingest/scene_description/op_1_0320241830.mp4.txt new file mode 100644 index 000000000..9d3f8b5fb --- /dev/null +++ b/VideoRAGQnA/video_ingest/scene_description/op_1_0320241830.mp4.txt @@ -0,0 +1,2 @@ + The main focus of the video is a man who is standing in an aisle at a hardware store, looking at items on the shelves. He is wearing a blue plaid shirt and grey pants, and he appears to be focused on finding a specific item or comparing different products. As he looks at the items, he occasionally reaches out to pick up or examine them more closely. The man seems to be taking his time to find the right item for his needs, and he appears to be engaged in his task. +There are also several other people visible in the background, but they do not seem to be actively engaging with the items in the store. They appear to be browsing or walking around the store, possibly looking for other items or just passing through. The store itself has a variety of items available, including tools, building materials, and other household goods. The items are displayed on shelves and racks throughout the store, making it easy for customers to find what they need. The overall atmosphere of the store seems to be calm and orderly, with customers browsing and shopping at their own pace. diff --git a/VideoRAGQnA/video_ingest/scene_description/op_21_0320241830.mp4.txt b/VideoRAGQnA/video_ingest/scene_description/op_21_0320241830.mp4.txt new file mode 100644 index 000000000..7efd74484 --- /dev/null +++ b/VideoRAGQnA/video_ingest/scene_description/op_21_0320241830.mp4.txt @@ -0,0 +1,3 @@ + The video features two men shopping in the hardware store. One man is wearing a black shirt and is pushing a red cart filled with various items. He is holding a magazine in his hand and appears to be reading it while walking down the aisle. The other man is standing next to him, also looking at the items on display. Both men seem to be focused on finding something specific in the store. +As they walk down the aisle, they come across several items such as a pair of scissors, a hammer, and a set of wrenches. The scissors are silver and black in color, while the hammer has a wooden handle and a metal head. The wrenches are made of metal and come in different sizes. The men appear to be interested in these items and may consider purchasing them for their projects. +The overall atmosphere of the store seems to be calm and organized, with the men taking their time to explore the products available. The colors of the items they interact with are predominantly silver, black, and brown, which gives the store a classic and traditional feel. diff --git a/VideoRAGQnA/video_ingest/scene_description/op_24_0320241830.mp4.txt b/VideoRAGQnA/video_ingest/scene_description/op_24_0320241830.mp4.txt new file mode 100644 index 000000000..35b1b2b47 --- /dev/null +++ b/VideoRAGQnA/video_ingest/scene_description/op_24_0320241830.mp4.txt @@ -0,0 +1 @@ + The first person in the video is a man who walks down the aisle of the hardware store, looking at various items on display. He appears to be casually dressed, wearing a t-shirt and jeans. As he stops to look at something on the shelf, his attention is focused on the item, indicating that he might be interested in buying it. The second person in the video is a woman who stands near the end of the aisle, possibly waiting for someone or just browsing the store. She is also casually dressed, wearing a tank top and shorts. Her presence adds to the overall atmosphere of the store, making it feel more lively and social. Overall, both individuals appear to be engaged in their respective activities within the store, either browsing or shopping for specific items. The colors and textures of the items they interact with add to the visual appeal of the store, creating a visually rich environment that invites further exploration and discovery. diff --git a/VideoRAGQnA/video_ingest/scene_description/op_31_0320241830.mp4.txt b/VideoRAGQnA/video_ingest/scene_description/op_31_0320241830.mp4.txt new file mode 100644 index 000000000..636899f5c --- /dev/null +++ b/VideoRAGQnA/video_ingest/scene_description/op_31_0320241830.mp4.txt @@ -0,0 +1 @@ + The video shows a man standing behind a counter in a hardware store aisle. He is wearing a black shirt and appears to be selling items or providing assistance to customers. Another person can be seen walking by him, but he doesn't seem to pay much attention to them. The store seems to have various items for sale, including tools and other hardware products. The man behind the counter seems to be knowledgeable about the products and is likely an employee of the store. Overall, the scene depicts a typical day at a hardware store where customers come to purchase items or seek assistance from the staff. diff --git a/VideoRAGQnA/video_ingest/scene_description/op_47_0320241830.mp4.txt b/VideoRAGQnA/video_ingest/scene_description/op_47_0320241830.mp4.txt new file mode 100644 index 000000000..f848c8e52 --- /dev/null +++ b/VideoRAGQnA/video_ingest/scene_description/op_47_0320241830.mp4.txt @@ -0,0 +1,2 @@ + The video captures three men shopping in a hardware store aisle. One man is wearing a black shirt and jeans, holding a red basket. He appears to be examining an item on the shelf, possibly deciding whether to purchase it or not. Another man is wearing a blue shirt and khakis, holding a blue basket. He seems to be looking at a different item on the shelf, possibly comparing its features to the one he's already picked up. The third man is standing behind them, wearing a white shirt and gray pants. He appears to be observing the other two men's choices or waiting for his turn to make a selection. +As they shop, the men interact with various objects in the store. One man picks up a green object off the shelf, inspecting it closely before placing it back down. Another man holds a yellow object in his hand, likely considering whether to buy it or not. Throughout the scene, there are multiple objects on the shelves, including tools, paint supplies, and other hardware items. The colors of these objects vary, adding visual interest to the scene. Overall, the video showcases a typical day at a hardware store, where customers are browsing through various products and making purchases. diff --git a/VideoRAGQnA/video_ingest/scene_description/op_5_0320241915.mp4.txt b/VideoRAGQnA/video_ingest/scene_description/op_5_0320241915.mp4.txt new file mode 100644 index 000000000..a1c689ea3 --- /dev/null +++ b/VideoRAGQnA/video_ingest/scene_description/op_5_0320241915.mp4.txt @@ -0,0 +1,3 @@ + The video features two men shopping in the hardware store. One man is walking down the aisle, while the other man is standing near him. Both men appear to be focused on finding specific items in the store. They are both wearing casual clothing, which suggests that they might be browsing through the store for personal use or for a project they are working on. +As they walk down the aisle, they pass by several items, including paintbrushes, scissors, and other tools. The paintbrushes are colorful and come in different sizes, while the scissors have sharp edges and are designed for cutting materials. The men seem to be examining these items closely, likely considering whether they need them for their project or not. +The overall atmosphere of the store appears to be calm and organized, with the items neatly arranged on shelves and displays. The colors of the items are predominantly neutral, with some accents of bright colors, such as red and blue, adding visual interest to the store's interior. The men's interactions with the objects in the store suggest that they are engaged in a thoughtful and deliberate process of selecting the right tools and supplies for their needs. diff --git a/VideoRAGQnA/video_ingest/scene_description/op_DSCF2862_Rendered_001.mp4.txt b/VideoRAGQnA/video_ingest/scene_description/op_DSCF2862_Rendered_001.mp4.txt new file mode 100644 index 000000000..e7a22d0eb --- /dev/null +++ b/VideoRAGQnA/video_ingest/scene_description/op_DSCF2862_Rendered_001.mp4.txt @@ -0,0 +1,3 @@ + The video shows a diverse group of people shopping in the hardware store aisle. One man is prominently visible, holding up an item while another man looks at it closely. This suggests that they might be discussing the product's features or comparing it to similar items. Other people in the store are also browsing through the shelves, likely searching for specific items or exploring new products. +In terms of clothing, most of the individuals appear to be dressed casually, with some wearing jeans and t-shirts, while others may have more formal attire such as button-down shirts or slacks. The store itself has a variety of items displayed on the shelves, ranging from tools and equipment to cleaning supplies and home decorations. The items are arranged neatly, making it easy for customers to find what they need. +The color palette of the store is predominantly neutral, with white walls and wooden shelves providing a clean and organized look. The items themselves come in various colors, reflecting the diverse range of products available in the store. Some items may have bright colors, while others may have more subdued tones, depending on their purpose and intended use. Overall, the scene captures a typical day at a hardware store, where customers are engaged in their shopping experience and interacting with the products available. diff --git a/VideoRAGQnA/video_ingest/scene_description/op_DSCF2864_Rendered_006.mp4.txt b/VideoRAGQnA/video_ingest/scene_description/op_DSCF2864_Rendered_006.mp4.txt new file mode 100644 index 000000000..6d428d1c4 --- /dev/null +++ b/VideoRAGQnA/video_ingest/scene_description/op_DSCF2864_Rendered_006.mp4.txt @@ -0,0 +1 @@ + The video shows a man walking down an aisle in a hardware store. He is wearing a blue shirt and khaki pants. As he walks, he looks at various items on the shelves, including tools and cleaning supplies. He stops to pick up a pair of scissors, examines them closely, and then puts them back on the shelf. Throughout the video, there are several other people visible in the background, but the main focus is on the man in the blue shirt. The store has a variety of products displayed on the shelves, including different types of tools, cleaning supplies, and other household items. The color scheme of the store is predominantly white, which gives it a bright and clean appearance. The overall atmosphere of the store appears to be organized and easy to navigate, allowing customers to quickly find what they're looking for. diff --git a/VideoRAGQnA/video_ingest/videos/op_10_0320241830.mp4 b/VideoRAGQnA/video_ingest/videos/op_10_0320241830.mp4 new file mode 100644 index 000000000..62ff6ce55 Binary files /dev/null and b/VideoRAGQnA/video_ingest/videos/op_10_0320241830.mp4 differ diff --git a/VideoRAGQnA/video_ingest/videos/op_19_0320241830.mp4 b/VideoRAGQnA/video_ingest/videos/op_19_0320241830.mp4 new file mode 100644 index 000000000..e5ce24dcd Binary files /dev/null and b/VideoRAGQnA/video_ingest/videos/op_19_0320241830.mp4 differ diff --git a/VideoRAGQnA/video_ingest/videos/op_1_0320241830.mp4 b/VideoRAGQnA/video_ingest/videos/op_1_0320241830.mp4 new file mode 100644 index 000000000..29c5dffcd Binary files /dev/null and b/VideoRAGQnA/video_ingest/videos/op_1_0320241830.mp4 differ diff --git a/VideoRAGQnA/video_ingest/videos/op_21_0320241830.mp4 b/VideoRAGQnA/video_ingest/videos/op_21_0320241830.mp4 new file mode 100644 index 000000000..4b67bd4d3 Binary files /dev/null and b/VideoRAGQnA/video_ingest/videos/op_21_0320241830.mp4 differ diff --git a/VideoRAGQnA/video_ingest/videos/op_24_0320241830.mp4 b/VideoRAGQnA/video_ingest/videos/op_24_0320241830.mp4 new file mode 100644 index 000000000..69cbd7f61 Binary files /dev/null and b/VideoRAGQnA/video_ingest/videos/op_24_0320241830.mp4 differ diff --git a/VideoRAGQnA/video_ingest/videos/op_31_0320241830.mp4 b/VideoRAGQnA/video_ingest/videos/op_31_0320241830.mp4 new file mode 100644 index 000000000..2ee0c1ec9 Binary files /dev/null and b/VideoRAGQnA/video_ingest/videos/op_31_0320241830.mp4 differ diff --git a/VideoRAGQnA/video_ingest/videos/op_47_0320241830.mp4 b/VideoRAGQnA/video_ingest/videos/op_47_0320241830.mp4 new file mode 100644 index 000000000..0ec140b0f Binary files /dev/null and b/VideoRAGQnA/video_ingest/videos/op_47_0320241830.mp4 differ diff --git a/VideoRAGQnA/video_ingest/videos/op_5_0320241915.mp4 b/VideoRAGQnA/video_ingest/videos/op_5_0320241915.mp4 new file mode 100644 index 000000000..2466dabb8 Binary files /dev/null and b/VideoRAGQnA/video_ingest/videos/op_5_0320241915.mp4 differ diff --git a/VideoRAGQnA/video_ingest/videos/op_DSCF2862_Rendered_001.mp4 b/VideoRAGQnA/video_ingest/videos/op_DSCF2862_Rendered_001.mp4 new file mode 100644 index 000000000..f5882aa25 Binary files /dev/null and b/VideoRAGQnA/video_ingest/videos/op_DSCF2862_Rendered_001.mp4 differ diff --git a/VideoRAGQnA/video_ingest/videos/op_DSCF2864_Rendered_006.mp4 b/VideoRAGQnA/video_ingest/videos/op_DSCF2864_Rendered_006.mp4 new file mode 100644 index 000000000..5614e4360 Binary files /dev/null and b/VideoRAGQnA/video_ingest/videos/op_DSCF2864_Rendered_006.mp4 differ