Skip to content

Commit

Permalink
Code refactoring, video.json and modified docs
Browse files Browse the repository at this point in the history
  • Loading branch information
MatteoFasulo committed Aug 23, 2023
1 parent 8618800 commit 4bbb600
Show file tree
Hide file tree
Showing 3 changed files with 88 additions and 29 deletions.
40 changes: 35 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,17 @@ Discover Whisper-TikTok, an innovative AI-powered tool that leverages the prowes

## Demo Video 🎬

https://github.com/MatteoFasulo/Whisper-TikTok/assets/74818541/68e25504-c305-4144-bd39-c9acc218c3a4
<https://github.com/MatteoFasulo/Whisper-TikTok/assets/74818541/68e25504-c305-4144-bd39-c9acc218c3a4>

## Operating Principle

Employing Whisper-TikTok is a breeze: simply complete the JSON-formatted dictionary located at the outset of the `main.py` file.
Employing Whisper-TikTok is a breeze: simply modify the [video.json](code/video.json). The JSON file contains the following fields:

- `series`: The name of the series.
- `part`: The part number of the video.
- `text`: The text to be spoken in the video.
- `outro`: The outro text to be spoken in the video.
- `path`: The path for saving subtitles files.

Summarizing the program's functionality:

Expand All @@ -19,7 +25,7 @@ Summarizing the program's functionality:
The program conducts the **sequence of actions** outlined below:

1. Retrieve **environment variables** from the optional .env file.
2. Validate the presence of **PyTorch** with **CUDA** installation.
2. Validate the presence of **PyTorch** with **CUDA** installation. If the requisite dependencies are **absent**, the **program will use the CPU instead of the GPU**.
3. Download a random video from platforms like YouTube, e.g., a Minecraft parkour gameplay clip.
4. Load the OpenAI Whisper model into memory.
5. Extract the video text from the provided JSON file and initiate a **Text-to-Speech** request to the Microsoft Edge Cloud TTS API, preserving the response as an .mp3 audio file.
Expand All @@ -38,13 +44,37 @@ Whisper-TikTok has undergone rigorous testing on Windows 10 systems equipped wit
pip install -r requirements.txt
```

Keep in mind that, due to the utilization of the OpenAI Whisper model for speech recognition, a GPU is indispensable for program execution. PyTorch will evaluate the availability of the CUDA driver and harness the GPU's capabilities whenever feasible.
It also requires the command-line tool ffmpeg to be installed on your system, which is available from most package managers:

```bash
# on Ubuntu or Debian

sudo apt update && sudo apt install ffmpeg

# on Arch Linux

sudo pacman -S ffmpeg

# on MacOS using Homebrew (<https://brew.sh/>)

brew install ffmpeg

# on Windows using Chocolatey (<https://chocolatey.org/>)

choco install ffmpeg

# on Windows using Scoop (<https://scoop.sh/>)

scoop install ffmpeg
```

Keep in mind that, due to the utilization of the OpenAI Whisper model for speech recognition, a GPU is recommended for optimal performance. However, the program will still function without a GPU, albeit at a slower pace.

## Usage Guidelines 📝

To embark on your Whisper-TikTok journey, initiate the following command within your terminal:

```python
```bash
python main.py
```

Expand Down
61 changes: 37 additions & 24 deletions code/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import os
import random
import re
import json
import sys
import subprocess
import asyncio
Expand All @@ -10,8 +11,6 @@
from typing import Tuple
import datetime

from utils import *

# PyTorch
import torch

Expand All @@ -29,11 +28,15 @@
# FFMPEG (Python)
import ffmpeg

# utils.py
from utils import *

HOME = os.getcwd()

# Logging
if not os.path.isdir('log'):
os.mkdir('log')

with KeepDir() as keep_dir:
keep_dir.chdir("log")
log_filename = f'{datetime.date.today()}.log'
Expand All @@ -45,38 +48,39 @@
]
)
logger = logging.getLogger(__name__)
#######################
# STATIC #
#######################
jsonData = {"series": "Crazy facts that you did not know",
"part": 4,
"outro": "Follow us for more",
"random": False,
"path": "F:\\PremiereTrash", # Path where .mp3 tts file and .srt file will be saved
"texts": ["Did you know that there are more possible iterations of a game of chess than there are atoms in the observable universe? The number of possible legal moves in a game of chess is around 10^120!", "Hi my name is Matteo and this is a test", "Did you know that there is a species of jellyfish called Turritopsis dohrnii?"]}


###########################
# VIDEO.JSON #
###########################

jsonData = json.load(open('video.json', encoding='utf-8'))


#######################
# CODE #
#######################


async def main() -> bool:
# Clear terminal
console.clear()

logging.debug('Creating video')
with console.status("[bold cyan]Creating video...") as status:
load_dotenv(find_dotenv())
load_dotenv(find_dotenv()) # Optional
console.log(
f"| [green]OK[/green] | Finish loading environment variables")
logging.info('Finish loading environment variables')

assert (torch.cuda.is_available())
console.log(f"| [green]OK[/green] | PyTorch GPU version found")
logging.info('PyTorch GPU version found')

series = jsonData['series']
part = jsonData['part']
outro = jsonData['outro']
path = jsonData['path']
# Check if GPU is available for PyTorch (CUDA).
if torch.cuda.is_available():
console.log(f"| [green]OK[/green] | PyTorch GPU version found")
logging.info('PyTorch GPU version found')
else:
console.log(
f"| [yellow][WARNING][/yellow] | PyTorch GPU not found, using CPU instead")
logging.warning('PyTorch GPU not found')

download_video(url='https://www.youtube.com/watch?v=intRX7BRA90')

Expand All @@ -85,34 +89,43 @@ async def main() -> bool:
logging.info('OpenAI-Whisper model loaded')

# Text 2 Speech (Edge TTS API)
for text in jsonData['texts']:
for video_id, video in enumerate(jsonData):
series = video['series']
part = video['part']
outro = video['outro']
path = video['path']
text = video['text']

req_text, filename = create_full_text(
path, series, part, text, outro)

console.log(f"| [green]OK[/green] | Text converted successfully")
logging.info('Text converted successfully')

await tts(req_text, outfile=filename)

console.log(
f"| [green]OK[/green] | Text2Speech mp3 file generated successfully!")
logging.info('Text2Speech mp3 file generated successfully!')

# Whisper Model to create SRT file from Speech recording
srt_filename = srt_create(
model, path, series, part, text, filename)

console.log(
f"| [green]OK[/green] | Transcription srt and ass file saved successfully!")
logging.info('Transcription srt and ass file saved successfully!')

# Background video with srt and duration
background_mp4 = random_background()
file_info = get_info(background_mp4)
final_video = prepare_background(
background_mp4, filename_mp3=filename, filename_srt=srt_filename, duration=int(file_info.get('duration')))

console.log(
f"| [green]OK[/green] | MP4 video saved successfully!\nPath: {final_video}")
logging.info(f'MP4 video saved successfully!\nPath: {final_video}')

# Increment part so it can fetch the next text in JSON
part += 1

console.log(f'[bold][red]Done![/red][/bold]')
return True

Expand Down
16 changes: 16 additions & 0 deletions code/video.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
[
{
"series": "Crazy facts that you did not know",
"part": "4",
"outro": "Follow us for more",
"path": "F:\\PremiereTrash",
"text": "Did you know that there are more possible iterations of a game of chess than there are atoms in the observable universe? The number of possible legal moves in a game of chess is around 10^120!"
},
{
"series": "Crazy facts that you did not know",
"part": "5",
"outro": "Follow us for more",
"path": "F:\\PremiereTrash",
"text": "Hi, this is a test and it will be a video!"
}
]

0 comments on commit 4bbb600

Please sign in to comment.