Generates text from audio containing speech
# apt install libsndfile-dev ffmpeg
- ASR using Jasper (from NemoToolkit )
- ASR using Wav2Vec2 (from fairseq )
To install the packages and its dependencies run.
python setup.py install
or with pip
pip install .[all]
The installation should work on Python 3.6 or newer. Untested on Python 2.7
Jasper
from plume.models.jasper_nemo.asr import JasperASR
asr_model = JasperASR("/path/to/model_config_yaml","/path/to/encoder_checkpoint","/path/to/decoder_checkpoint") # Loads the models
TEXT = asr_model.transcribe(wav_data) # Returns the text spoken in the wav
Wav2Vec2
from plume.models.wav2vec2.asr import Wav2Vec2ASR
asr_model = Wav2Vec2ASR("/path/to/ctc_checkpoint","/path/to/w2v_checkpoint","/path/to/target_dictionary") # Loads the models
TEXT = asr_model.transcribe(wav_data) # Returns the text spoken in the wav
$ plume
Jasper https://ngc.nvidia.com/catalog/models/nvidia:multidataset_jasper10x5dr/files?version=3 Wav2Vec2 https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md