Whisper-Run is a pip CLI tool for processing audio files using Whisper models with speaker diarization capabilities. The tool allows you to process audio files, select models for audio processing, and save the results in JSON format.
It uses the OpenAI-Whisper model implementation from OpenAI Whisper, based on the ctranslate2 library from faster-whisper, and pyannote's speaker-diarization-3.1. Check their documentation if needed.
To install Whisper-Run, run the following command:
pip install whisper-run
You can call Whisper-Run from the command line using the following syntax:
whisper-run --file_path=<file_path>
To process an audio file using the CPU and a specific file path:
whisper-run --device=cpu --file_path=your_file_path
When you run the command, you'll be prompted to select a model for audio processing:
[?] Select a model for audio processing:
> distil-large-v3
distil-large-v2
large-v3
large-v2
large
medium
small
base
tiny
--device
: Specify the device to use for processing (e.g.,cpu
orcuda
).--file_path
: Specify the path to the audio file you want to process.--hf_auth_token
: Optional. Pass the Hugging Face Auth Token or set theHF_AUTH_TOKEN
environment variable.
You can also use Whisper-Run programmatically in your Python scripts. Below is a basic usage example demonstrating how to use the Whisper-Run library:
from whisper_run import AudioProcessor
def main():
processor = AudioProcessor(file_path="your_file_path",
device="cpu",
model_name="large-v3"
)
processor.process()
if __name__ == "__main__":
main()
Contributions are welcome! Please open an issue or submit a pull request on GitHub.
This project is licensed under the Apache 2.0 License.