Code refactoring, video.json and modified docs

MatteoFasulo · Aug 23, 2023 · 4bbb600 · 4bbb600
1 parent 8618800
commit 4bbb600
Show file tree

Hide file tree

Showing 3 changed files with 88 additions and 29 deletions.
diff --git a/README.md b/README.md
@@ -4,11 +4,17 @@ Discover Whisper-TikTok, an innovative AI-powered tool that leverages the prowes
 
 ## Demo Video 🎬
 
-https://github.com/MatteoFasulo/Whisper-TikTok/assets/74818541/68e25504-c305-4144-bd39-c9acc218c3a4
+<https://github.com/MatteoFasulo/Whisper-TikTok/assets/74818541/68e25504-c305-4144-bd39-c9acc218c3a4>
 
 ## Operating Principle
 
-Employing Whisper-TikTok is a breeze: simply complete the JSON-formatted dictionary located at the outset of the `main.py` file.
+Employing Whisper-TikTok is a breeze: simply modify the [video.json](code/video.json). The JSON file contains the following fields:
+
+- `series`: The name of the series.
+- `part`: The part number of the video.
+- `text`: The text to be spoken in the video.
+- `outro`: The outro text to be spoken in the video.
+- `path`: The path for saving subtitles files.
 
 Summarizing the program's functionality:
 
@@ -19,7 +25,7 @@ Summarizing the program's functionality:
 The program conducts the **sequence of actions** outlined below:
 
 1. Retrieve **environment variables** from the optional .env file.
-2. Validate the presence of **PyTorch** with **CUDA** installation.
+2. Validate the presence of **PyTorch** with **CUDA** installation. If the requisite dependencies are **absent**, the **program will use the CPU instead of the GPU**.
 3. Download a random video from platforms like YouTube, e.g., a Minecraft parkour gameplay clip.
 4. Load the OpenAI Whisper model into memory.
 5. Extract the video text from the provided JSON file and initiate a **Text-to-Speech** request to the Microsoft Edge Cloud TTS API, preserving the response as an .mp3 audio file.
@@ -38,13 +44,37 @@ Whisper-TikTok has undergone rigorous testing on Windows 10 systems equipped wit
 pip install -r requirements.txt
 ```
 
-Keep in mind that, due to the utilization of the OpenAI Whisper model for speech recognition, a GPU is indispensable for program execution. PyTorch will evaluate the availability of the CUDA driver and harness the GPU's capabilities whenever feasible.
+It also requires the command-line tool ffmpeg to be installed on your system, which is available from most package managers:
+
+```bash
+# on Ubuntu or Debian
+
+sudo apt update && sudo apt install ffmpeg
+
+# on Arch Linux
+
+sudo pacman -S ffmpeg
+
+# on MacOS using Homebrew (<https://brew.sh/>)
+
+brew install ffmpeg
+
+# on Windows using Chocolatey (<https://chocolatey.org/>)
+
+choco install ffmpeg
+
+# on Windows using Scoop (<https://scoop.sh/>)
+
+scoop install ffmpeg
+```
+
+Keep in mind that, due to the utilization of the OpenAI Whisper model for speech recognition, a GPU is recommended for optimal performance. However, the program will still function without a GPU, albeit at a slower pace.
 
 ## Usage Guidelines 📝
 
 To embark on your Whisper-TikTok journey, initiate the following command within your terminal:
 
-```python
+```bash
 python main.py
 ```
 

diff --git a/code/main.py b/code/main.py
@@ -2,6 +2,7 @@
 import os
 import random
 import re
+import json
 import sys
 import subprocess
 import asyncio
@@ -10,8 +11,6 @@
 from typing import Tuple
 import datetime
 
-from utils import *
-
 # PyTorch
 import torch
 
@@ -29,11 +28,15 @@
 # FFMPEG (Python)
 import ffmpeg
 
+# utils.py
+from utils import *
+
 HOME = os.getcwd()
 
 # Logging
 if not os.path.isdir('log'):
     os.mkdir('log')
+
 with KeepDir() as keep_dir:
     keep_dir.chdir("log")
     log_filename = f'{datetime.date.today()}.log'
@@ -45,38 +48,39 @@
         ]
     )
     logger = logging.getLogger(__name__)
-#######################
-#        STATIC       #
-#######################
-jsonData = {"series": "Crazy facts that you did not know",
-            "part": 4,
-            "outro": "Follow us for more",
-            "random": False,
-            "path": "F:\\PremiereTrash",  # Path where .mp3 tts file and .srt file will be saved
-            "texts": ["Did you know that there are more possible iterations of a game of chess than there are atoms in the observable universe? The number of possible legal moves in a game of chess is around 10^120!", "Hi my name is Matteo and this is a test", "Did you know that there is a species of jellyfish called Turritopsis dohrnii?"]}
+
+
+###########################
+#        VIDEO.JSON       #
+###########################
+
+jsonData = json.load(open('video.json', encoding='utf-8'))
+
 
 #######################
 #         CODE        #
 #######################
 
 
 async def main() -> bool:
+    # Clear terminal
     console.clear()
+
     logging.debug('Creating video')
     with console.status("[bold cyan]Creating video...") as status:
-        load_dotenv(find_dotenv())
+        load_dotenv(find_dotenv())  # Optional
         console.log(
             f"| [green]OK[/green] | Finish loading environment variables")
         logging.info('Finish loading environment variables')
 
-        assert (torch.cuda.is_available())
-        console.log(f"| [green]OK[/green] | PyTorch GPU version found")
-        logging.info('PyTorch GPU version found')
-
-        series = jsonData['series']
-        part = jsonData['part']
-        outro = jsonData['outro']
-        path = jsonData['path']
+        # Check if GPU is available for PyTorch (CUDA).
+        if torch.cuda.is_available():
+            console.log(f"| [green]OK[/green] | PyTorch GPU version found")
+            logging.info('PyTorch GPU version found')
+        else:
+            console.log(
+                f"| [yellow][WARNING][/yellow] | PyTorch GPU not found, using CPU instead")
+            logging.warning('PyTorch GPU not found')
 
         download_video(url='https://www.youtube.com/watch?v=intRX7BRA90')
 
@@ -85,34 +89,43 @@ async def main() -> bool:
         logging.info('OpenAI-Whisper model loaded')
 
         # Text 2 Speech (Edge TTS API)
-        for text in jsonData['texts']:
+        for video_id, video in enumerate(jsonData):
+            series = video['series']
+            part = video['part']
+            outro = video['outro']
+            path = video['path']
+            text = video['text']
+
             req_text, filename = create_full_text(
                 path, series, part, text, outro)
+
             console.log(f"| [green]OK[/green] | Text converted successfully")
             logging.info('Text converted successfully')
+
             await tts(req_text, outfile=filename)
+
             console.log(
                 f"| [green]OK[/green] | Text2Speech mp3 file generated successfully!")
             logging.info('Text2Speech mp3 file generated successfully!')
 
             # Whisper Model to create SRT file from Speech recording
             srt_filename = srt_create(
                 model, path, series, part, text, filename)
+
             console.log(
                 f"| [green]OK[/green] | Transcription srt and ass file saved successfully!")
             logging.info('Transcription srt and ass file saved successfully!')
 
+            # Background video with srt and duration
             background_mp4 = random_background()
             file_info = get_info(background_mp4)
             final_video = prepare_background(
                 background_mp4, filename_mp3=filename, filename_srt=srt_filename, duration=int(file_info.get('duration')))
+
             console.log(
                 f"| [green]OK[/green] | MP4 video saved successfully!\nPath: {final_video}")
             logging.info(f'MP4 video saved successfully!\nPath: {final_video}')
 
-            # Increment part so it can fetch the next text in JSON
-            part += 1
-
     console.log(f'[bold][red]Done![/red][/bold]')
     return True
 

diff --git a/code/video.json b/code/video.json
@@ -0,0 +1,16 @@
+[
+    {
+        "series": "Crazy facts that you did not know",
+        "part": "4",
+        "outro": "Follow us for more",
+        "path": "F:\\PremiereTrash",
+        "text": "Did you know that there are more possible iterations of a game of chess than there are atoms in the observable universe? The number of possible legal moves in a game of chess is around 10^120!"
+    },
+    {
+        "series": "Crazy facts that you did not know",
+        "part": "5",
+        "outro": "Follow us for more",
+        "path": "F:\\PremiereTrash",
+        "text": "Hi, this is a test and it will be a video!"
+    }
+]