Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Session History Feature for the Video RAG Chat Interface #253

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
3ea8cb0
Added VideoRAG + time-based search use case
ttrigui May 29, 2024
5e8bfe0
Added placeholders for prompt processing and UI
ttrigui May 29, 2024
a245751
Cleaned up Readme, requirements and VectorDB; Added env variables for…
avbodas May 29, 2024
0e7b4aa
Merge pull request #1 from avbodas/abdev
ttrigui May 29, 2024
6ef4527
update README file
ttrigui May 29, 2024
73ca4b4
add get_history function to retrive messages from session state
bashirmoham Jun 3, 2024
041199e
Add history parameter to the function
bashirmoham Jun 3, 2024
6dff959
fix instruction for the assistant
bashirmoham Jun 3, 2024
1148bc5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 4, 2024
45976c4
Enable autoescaping for Jinja2 to prevent vulnerabilities
bashirmoham Jun 4, 2024
3ce37aa
Merge branch 'session-history' of https://github.com/ttrigui/GenAIExa…
bashirmoham Jun 4, 2024
fa509ba
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 4, 2024
6db5f0c
Update prompt_handler.py
bashirmoham Jun 5, 2024
efbfe5f
Merge branch 'session-history' of https://github.com/ttrigui/GenAIExa…
bashirmoham Jun 5, 2024
c726b91
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 5, 2024
d634b42
enable chat interface
bashirmoham Jun 12, 2024
c9a89ec
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 12, 2024
98aa223
Update README.md
bashirmoham Jun 12, 2024
ec4ce04
Merge branch 'session-history' of https://github.com/ttrigui/GenAIExa…
bashirmoham Jun 12, 2024
e8dee0d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 12, 2024
74c3260
Update video-rag-ui.py
bashirmoham Jun 12, 2024
a9b12cf
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 12, 2024
392967f
Update config.yaml
bashirmoham Jun 12, 2024
22a5d3f
Update config.yaml
bashirmoham Jun 12, 2024
51b594a
Merge branch 'main' into session-history
lvliang-intel Jun 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions VideoRAGQnA/README.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All microservice-related code should be placed in the GenAIComps repo. Only the Docker Compose files, Kubernetes manifests, and UI code need to be stored in the GenAIExamples repo. Please reorganize your code accordingly. Thanks.

Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Video RAG

## Introduction

Video RAG is a framework that retrieves video based on provided user prompt. It uses both video scene description generated by open source vision models (ex video-llama, video-llava etc.) as text embeddings and frames as image embeddings to perform vector similarity search. The provided solution also supports feature to retrieve more similar videos without prompting it. (see the example video below)

![Example Video](docs/visual-rag-demo.gif)

## Tools

- **UI**: gradio **or** streamlit
- **Vector Storage**: Chroma DB **or** Intel's VDMS
- **Image Embeddings**: CLIP
- **Text Embeddings**: all-MiniLM-L12-v2
- **RAG Retriever**: Langchain Ensemble Retrieval

## Prerequisites

There are 10 example videos present in `video_ingest/videos` along with their description generated by open-source vision model.
If you want these video RAG to work on your own videos, make sure it matches below format.

## File Structure

```bash
video_ingest/
.
├── scene_description
│ ├── op_10_0320241830.mp4.txt
│ ├── op_1_0320241830.mp4.txt
│ ├── op_19_0320241830.mp4.txt
│ ├── op_21_0320241830.mp4.txt
│ ├── op_24_0320241830.mp4.txt
│ ├── op_31_0320241830.mp4.txt
│ ├── op_47_0320241830.mp4.txt
│ ├── op_5_0320241915.mp4.txt
│ ├── op_DSCF2862_Rendered_001.mp4.txt
│ └── op_DSCF2864_Rendered_006.mp4.txt
└── videos
├── op_10_0320241830.mp4
├── op_1_0320241830.mp4
├── op_19_0320241830.mp4
├── op_21_0320241830.mp4
├── op_24_0320241830.mp4
├── op_31_0320241830.mp4
├── op_47_0320241830.mp4
├── op_5_0320241915.mp4
├── op_DSCF2862_Rendered_001.mp4
└── op_DSCF2864_Rendered_006.mp4
```

## Setup and Installation

Install pip requirements

```bash
cd VideoRAGQnA
pip3 install -r docs/requirements.txt
```

The current framework supports both Chroma DB and Intel's VDMS, use either of them,

Running Chroma DB as docker container

```bash
docker run -d -p 8000:8000 chromadb/chroma
```

**or**

Running VDMS DB as docker container

```bash
docker run -d -p 55555:55555 intellabs/vdms:latest
```

**Note:** If you are not using file structure similar to what is described above, consider changing it in `config.yaml`.

Update your choice of db and port in `config.yaml`.

```bash
export VECTORDB_SERVICE_HOST_IP=<ip of host where vector db is running>

export HUGGINGFACEHUB_API_TOKEN='<your HF token>'
```

HuggingFace hub API token can be generated [here](https://huggingface.co/login?next=%2Fsettings%2Ftokens).

Generating Image embeddings and store them into selected db, specify config file location and video input location

```bash
python3 embedding/generate_store_embeddings.py docs/config.yaml video_ingest/videos/
```

**Web UI Video RAG - Streamlit**

```bash
streamlit run video-rag-ui.py --server.address 0.0.0.0 --server.port 50055
```

**Web UI Video RAG - Gradio**

```bash
python3 video-rag-ui.py docs/config.yaml True '0.0.0.0' 50055
```
2 changes: 2 additions & 0 deletions VideoRAGQnA/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
26 changes: 26 additions & 0 deletions VideoRAGQnA/docs/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

# Path to all videos
videos: video_ingest/videos/
# Path to video description generated by open-source vision models (ex. video-llama, video-llava, etc.)
description: video_ingest/scene_description/
# Do you want to extract frames of videos (True if not done already, else False)
generate_frames: True
# Do you want to generate image embeddings?
embed_frames: True
# Path to store extracted frames
image_output_dir: video_ingest/frames/
# Path to store metadata files
meta_output_dir: video_ingest/frame_metadata/
# Number of frames to extract per second,
# if 24 fps, and this value is 2, then it will extract 12th and 24th frame
number_of_frames_per_second: 2

vector_db:
choice_of_db: 'vdms' #'chroma' # #Supported databases [vdms, chroma]
host: 0.0.0.0
port: 55555 #8000 #

# LLM path
model_path: meta-llama/Llama-2-7b-chat-hf
12 changes: 12 additions & 0 deletions VideoRAGQnA/docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
accelerate
chromadb
dateparser
gradio
langchain-experimental
metafunctions
open-clip-torch
opencv-python-headless
sentence-transformers
streamlit
tzlocal
vdms
Binary file added VideoRAGQnA/docs/visual-rag-demo.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions VideoRAGQnA/embedding/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
120 changes: 120 additions & 0 deletions VideoRAGQnA/embedding/extract_store_frames.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

import datetime
import json
import os
import random

import cv2
from tzlocal import get_localzone


def process_all_videos(path, image_output_dir, meta_output_dir, N, selected_db):

def extract_frames(video_path, image_output_dir, meta_output_dir, N, date_time, local_timezone):
video = video_path.split("/")[-1]
# Create a directory to store frames and metadata
os.makedirs(image_output_dir, exist_ok=True)
os.makedirs(meta_output_dir, exist_ok=True)

# Open the video file
cap = cv2.VideoCapture(video_path)

if int(cv2.__version__.split(".")[0]) < 3:
fps = cap.get(cv2.cv.CV_CAP_PROP_FPS)
else:
fps = cap.get(cv2.CAP_PROP_FPS)

total_frames = cap.get(cv2.CAP_PROP_FRAME_COUNT)

# print (f'fps {fps}')
# print (f'total frames {total_frames}')

mod = int(fps // N)
if mod == 0:
mod = 1

print(f"total frames {total_frames}, N {N}, mod {mod}")

# Variables to track frame count and desired frames
frame_count = 0

# Metadata dictionary to store timestamp and image paths
metadata = {}

while cap.isOpened():
ret, frame = cap.read()

if not ret:
break

frame_count += 1

if frame_count % mod == 0:
timestamp = cap.get(cv2.CAP_PROP_POS_MSEC) / 1000 # Convert milliseconds to seconds
frame_path = os.path.join(image_output_dir, f"{video}_{frame_count}.jpg")
time = date_time.strftime("%H:%M:%S")
date = date_time.strftime("%Y-%m-%d")
hours, minutes, seconds = map(float, time.split(":"))
year, month, day = map(int, date.split("-"))

cv2.imwrite(frame_path, frame) # Save the frame as an image

metadata[frame_count] = {
"timestamp": timestamp,
"frame_path": frame_path,
"date": date,
"year": year,
"month": month,
"day": day,
"time": time,
"hours": hours,
"minutes": minutes,
"seconds": seconds,
}
if selected_db == "vdms":
# Localize the current time to the local timezone of the machine
# Tahani might not need this
current_time_local = date_time.replace(tzinfo=datetime.timezone.utc).astimezone(local_timezone)

# Convert the localized time to ISO 8601 format with timezone offset
iso_date_time = current_time_local.isoformat()
metadata[frame_count]["date_time"] = {"_date": str(iso_date_time)}

# Save metadata to a JSON file
metadata_file = os.path.join(meta_output_dir, f"{video}_metadata.json")
with open(metadata_file, "w") as f:
json.dump(metadata, f, indent=4)

# Release the video capture and close all windows
cap.release()
print(f"{frame_count/mod} Frames extracted and metadata saved successfully.")
return fps, total_frames, metadata_file

videos = [file for file in os.listdir(path) if file.endswith(".mp4")]

# print (f'Total {len(videos)} videos will be processed')
metadata = {}

for i, each_video in enumerate(videos):
video_path = os.path.join(path, each_video)
date_time = datetime.datetime.now()
print("date_time : ", date_time)
# Get the local timezone of the machine
local_timezone = get_localzone()
fps, total_frames, metadata_file = extract_frames(
video_path, image_output_dir, meta_output_dir, N, date_time, local_timezone
)
metadata[each_video] = {
"fps": fps,
"total_frames": total_frames,
"extracted_frame_metadata_file": metadata_file,
"embedding_path": f"embeddings/{each_video}.pt",
"video_path": f"{path}/{each_video}",
}
print(f"✅ {i+1}/{len(videos)}")

metadata_file = os.path.join(meta_output_dir, "metadata.json")
with open(metadata_file, "w") as f:
json.dump(metadata, f, indent=4)
Loading
Loading