Speech to Phoneme: Deep Learning Automatic Phone Recognition

Authors: Kip McCharen, Pavan Kumar Bondalapati, Siddarth Surapaneni

SYS 6016: Deep Learning

University of Virginia

School of Data Science

May 13, 2021

Overview

Significant work has been done on automatic speech recognition (ASR) techniques, notably including fairly successful implementations such as Siri and Alexa; however, ASR is a different task than automatic phone recognition (APR), which involves consistently identifying not words but the unique and irreducible sounds from which words may be formed. In recent years, phone detection has shown its prominence in unique tasks such as transcribing poorly documented language (e.g. Inuktitut), tracking children’s exposure to word diversity, and automating the detection of certain speech and voice disorders. In this paper, we articulate our process of minimizing the phone error rate (PER) by employing numerous deep learning models.

TIMIT ASR with seq2seq Models

This folder contains the scripts to train a seq2seq RNN-based system using TIMIT, a speech dataset that is available from University of Pennsylvania's Lingusitic Data Consortium.

Usage

Run this command to train the model:

python train.py train/train.yaml

Results

Release	hyperparams file	Val. PER	Test PER	Model link	GPUs
21-04-08	train_with_wav2vec2.yaml	7.11	8.04	https://drive.google.com/drive/folders/1-IbO7hldwrRh4rwz9xAYzKeeMe57YIiq?usp=sharing	1xV100 32GB

Bash Commands to Run in Google Colab

!pip install speechbrain
!pip install transformers
!git clone https://github.com/kipmccharen/sys6016_DL_project
%cd ..
!gdown --id '1EIfBmwiT0RF3-U81-Qu5K4J27N31BdB5' ## --output /content/speechbrain_s2s_wav2vec_ckpt.zip
!unzip speechbrain_s2s_wav2vec_ckpt.zip
!rm speechbrain_s2s_wav2vec_ckpt.zip
%cd /content/data/trainwav2vec/save/
!gdown --id '1oZunuiwhMLfwtMeKAYJwr4DMjvE1LUIN' --output label_encoder.txt
%cd /content/
!python sys6016_DL_project/train_with_wav2vec2.py sys6016_DL_project/hparams/train_with_wav2vec2.yaml --data_folder /content/data/ --output_folder /content/data/trainwav2vec/ --new_json /content/sys6016_DL_project/data/new_train.json

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
hparams		hparams
.gitignore		.gitignore
README.md		README.md
label_encoder.txt		label_encoder.txt
paper.pdf		paper.pdf
timit_prepare.py		timit_prepare.py
train_with_wav2vec2.py		train_with_wav2vec2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech to Phoneme: Deep Learning Automatic Phone Recognition

Overview

TIMIT ASR with seq2seq Models

Usage

Results

Bash Commands to Run in Google Colab

About

Languages

pkbondalapati/dl-speechbrain

Folders and files

Latest commit

History

Repository files navigation

Speech to Phoneme: Deep Learning Automatic Phone Recognition

Overview

TIMIT ASR with seq2seq Models

Usage

Results

Bash Commands to Run in Google Colab

About

Resources

Stars

Watchers

Forks

Languages