Our church puts the pastor's messages on Youtube each week. I wanted to make those uploads a little more marketable with AI!
This is a script that, in conjunction with a vosk model, will accept a video file and provide the following:
- A Transcript of the video
- A Clickbait Title based on the themes in the transcript
- A list of suggested hashtags to apply to your upload
- A 4-8 sentence summary suitable for a youtube description field
- A list of 'quotable moments' in the video
- Clips of the video containing those quotable moments in a 25-second file suitable for a tiktok/reel type upload
- A list of discussion questions about the content
- Vosk provides the ability to transcribe from a mp4.
- Ollama provides a LLM to process the transcript and prompt it to create/identify content.
- Chroma provides in-memory vector db for embeddings so that Ollama can process the transcript.
-
Download a model from Vosk. The code currently uses
vosk-model-en-us-0.42-gigaspeech
but you can change that by changing theMODEL_FOLDER
variable intranscripter.py
. Transcripter is expecting the vosk folder to be in the same folder as main.py. -
Set up your venv. In your terminal, run:
python -m venv .venv
source .venv/bin/activate #in Windows, source .venv/Scripts/activate
You should have a little decorator before your command prompt now that looks like (.venv)
. Note - every time you run the application, you need to ensure this appears, or things won't work right!
- Install dependencies. In your terminal, run
pip install -r requirements.txt
-
Install ffmpeg, which is used for extracting the audio from the video file as well as cutting the short-form clips. Follow whatever standard installation instructions are for your platform - if you can run
ffmpeg -version
from your command line and not get an error message, you're ready to go on to the next step. -
Install Ollama.
-
From a command prompt, run the following:
ollama pull llama3.1
(You can of course swap out llama3.1 with any other LLM that you like, as long as Ollama uses localhost:11434
to provide an interface to it.)
This will cause Ollama to run a background process on your machine with the Llama3.1 model listening for inputs from the program. Ollama must be running for the Philbott to work! . You can verify this by pointing your web browser to https://localhost:11434
. If you get a message that Ollama is running, you're good to go to the next step.
config.yaml
is how you define what prompts are used by the system to create your outputs. The opening of a config is ALWAYS defining the list of 'prompts' as follows:
prompts:
Beneath this 'prompts' label is a repeating list of individual configuration values. Yaml starts each entry with a hyphen and then lists the name/value pairs, one on each line.
The first entry in every prompt is its 'type'.We currently support the following types:
Type | Description |
---|---|
SimpleText | This is a prompt that will return text from the LLM. It's a Q&A, or summary, or similar conversation where the response is always text that we use for something. |
VideoClipArray | This is a prompt that returns a pipe (| ) delimited list of inputs to be passed to our video-clip finder. This can be used for cases like "find a quotable moment" or "locate the point where a key point was made". You can match up the count of elements. Your prompt will be modified to request the pipe-delimited output format before being sent to the LLM. |
From your command prompt, you run:
python main.py
--file=path/to/your/video-file.mp4
--config=path/to/your/config.yaml
--outputfolder=/path/to/destination/folder
NOTE: the
outputfolder
parameter is optional - if you don't provide it, the Philbott will output its results to an "outputs" folder that it creates in the project folder. If the folder already exists it will use it; if it does not exist, it will be created. You can provide your path as absolute or relative by adding or omitting a leading '/' character.../home/Ben/Documents
is an absolute path whilehome/Ben/Documents
is relative to the current working directory.
The Philbott can take a while to process, depending on the size of your file! A 30-minute video on a Macbook M2 ran for about 2 minutes.
When it's complete, you'll have a series of new .txt files in the project root as well as some 25-second-long mp4s.