Introduction

nf-whisper
Automatic Speech Recognition (ASR) Nextflow pipeline using OpenAI Whisper

Introduction

nf-whisper is a simple Nextflow pipeline that leverages OpenAI's Whisper pre-trained models to generate transcriptions and translations from YouTube videos and audio files. Key features include:

Automatic transcription and translation of audio content
YouTube video downloading and audio extraction
Support for various Whisper pre-trained models
Flexible input options: YouTube URLs or local audio files
Optional timestamp generation for transcriptions

This pipeline streamlines the process of converting speech to text, making it easier for researchers, content creators, and developers to work with audio data.

Prerequisites and Setup

Install Nextflow: If you don't have Nextflow installed, visit nextflow.io for installation instructions.
Install Docker: This pipeline uses Docker to manage dependencies. Install Docker from docker.com.
Build the Docker image (or use Wave) From the root directory of this repository, run:
```
docker build . -t whisper
```
Or alternatively, you can use Wave to remotely build the container image on-the-fly. Just run the Nextflow commands below using -with-wave instead of -with-docker whisper.

Getting Started

Install Nextflow and Docker (if not already installed).

Run the pipeline by providing a YouTube URL using the --youtube_url parameter:

nextflow run main.nf --youtube_url https://www.youtube.com/watch\?v\=UVzLd304keA --model small.en -with-docker whisper

For local audio files use the --file parameter:

nextflow run main.nf --file audio_sample.wav --model small.en -with-docker whisper

Let's play!

Now that you have the basic setup working, let's explore more advanced features.

Generating transcriptions with timestamps using the --timestamp parameter:

nextflow run main.nf --youtube_url https://www.youtube.com/watch\?v\=UVzLd304keA --model small.en --timestamp -with-docker whisper

Use different models using the --model parameter:

nextflow run main.nf --youtube_url https://www.youtube.com/watch\?v\=UVzLd304keA --model tiny -with-docker whisper

Provide a local model file using the --model parameter:

nextflow run main.nf --youtube_url https://www.youtube.com/watch\?v\=UVzLd304keA --model /path/to/model.pt -with-docker whisper

Check out help with:

nextflow run main.nf --help

Available pre-trained models and languages

There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. The table below shows the available models and their approximate memory requirements and relative speed.

Size	Parameters	English-only model	Multilingual model	Required VRAM	Relative speed
tiny	39 M	`tiny.en`	`tiny`	~1 GB	~32x
base	74 M	`base.en`	`base`	~1 GB	~16x
small	244 M	`small.en`	`small`	~2 GB	~6x
medium	769 M	`medium.en`	`medium`	~5 GB	~2x
large	1550 M	N/A	`large`	~10 GB	1x

For English-only applications, the .en models tend to perform better, especially for the tiny.en and base.en models. The performance difference becomes less significant for the small.en and medium.en models.

Acknowledgements

The section above was retrieved from the README of Matthias Zepper amazing work on dockerizing Whisper with support for GPUs! This Nextflow pipeline was heavily influenced by Matthias' work, the official OpenAI Whisper GitHub repository, and some other blog posts I read, mostly this and this.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
modules		modules
pre_trained_models		pre_trained_models
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
TODO		TODO
main.nf		main.nf
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
tower.yml		tower.yml
transcription.txt		transcription.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Prerequisites and Setup

Getting Started

Let's play!

Available pre-trained models and languages

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

mribeirodantas/nf-whisper

Folders and files

Latest commit

History

Repository files navigation

Introduction

Prerequisites and Setup

Getting Started

Let's play!

Available pre-trained models and languages

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages