Training Speaker Recognition

Training

In training, models are built and data is fed to them, so that they learn certain parameters. In our project, models were built using GMMs and HMMs.

Model Building

We use HMM with Gaussian Mixture Emissions to model Speaker related information. One model is built for every speaker in the dataset. This can be done using GMMHMM class of hmmlearn package.

Parameters

Number of States : We chose 3. The number is advised to be between 3 and 10 in this paper. Each state models certain speaker parameters.
Number of Mixtures : We chose 128. Recommended between 32 and 512 in this paper.
Covariance Type : Diagonal. Most used in Speaker Recognition.
Initial Parameters : Used Default. It is sometimes better to use pre initialised values for pi and a in order to make a left-to-right model which models speech signals better. We can use score method to check which gives better results.

Implementation

Initial Parameters

N = 3  # Number of States of HMM.
Mixtures = 128  # Number of Gaussian Mixtures.

Using Randomly Initialised Parameters:

model = GMMHMM(n_components=N, n_mix=Mixtures, covariance_type='diag')

To force a left to right model:

startprob = np.ones(N) * (10**(-30))  # Left to Right Model
startprob[0] = 1.0 - (N-1)*(10**(-30))
transmat = np.zeros([N, N])  # Initial Transmat for Left to Right Model
for i in range(N):
    for j in range(N):
        transmat[i, j] = 1/(N-i)
transmat = np.triu(transmat, k=0)
transmat[transmat == 0] = (10**(-30))
model = GMMHMM(n_components=N, n_mix=Mixtures, covariance_type='diag', init_params="mcw")
model.startprob_ = startprob
model.transmat_ = transmat

Fitting

A HMM is characterised by 3 parameters :

Start Probability (pi),
Transition Probability Matrix (a)
Emission Matrix (b). These parameters are learnt using Forward Backward Algorithm. The fit method does the job for us. It takes as arguments, the training feature vectors. It is also possible to input Multiple Observation Sequences. If that is the case, we also have to pass the lengths array that was created earlier.

Implementation

Concatenating all input features

model.fit(feature_vectors)

Giving every input file separately (Multiple Observation Sequences)

model.fit(feature_vectors, lengths)

Storing

Once Model parameters are learned, we have to store them. We also need a name string for every speaker. So, we created a separate class called GMMModel() whose objects are the model and a name. Then we use pickle.dump to store the model, so that it can be loaded when classification needs to be done.

Implementation

User Defined Class

class GMMModel(object):  # CLASS FOR HMM WITH GAUSSIAN MIXTURE EMISSIONS

    def __init__(self, model_name, name):
        self.model = model_name  # Object of Type GMMHMM from hmmlearn
        self.name = name  # Name of Speaker

Creating a Binary File

speaker_number = 12 # What is the speaker's number
model_filename = "gmodel"+str(speaker_number)
f = open(model_filename, "wb")

Pickling

sample = GMMModel(model, "MREM")  # TODO: Change Name as well.
pickle.dump(sample, f)

Issues

Decisions on No of States & No of Mixtures: This is more trivial than it might seem. No of states must be 3 to 10. Number of Mixtures is better off being a power of 2 and somewhere between 32 and 512. Different values were tried and models were critically evaluated to arrive at the values.
Finding a library / implementation : The hmmlearn library was removed from scikit-learn API and was Deprecated. So, there are a lot of Deprecation and Divide by 0 warnings. There is a possibility of the model not working as expected because of such problems. MATLAB has a good HMM Toolkit which can be used.
Using GMM instead of K-Means: K-Means quantises all the feature vectors into discrete observations. Instead GMM does a soft quantisation. So, it might be able to model things that might otherwise be lost. This link explains this situation better.

HMM with Discrete Emissions

HMM with discrete emissions was initially planned for the project. That can be implemented using MultinomialHMM class of hmmlearn. In this case, it was also required that we quantise the input feature vectors into codebooks. This can be done using scikit-learn KMeans class. The steps are similar except there is a step between feature extraction and model building. We have to quantise the feature vectors into K Discrete Codebooks. This is done using K Means. 128 or 256 codes are recommended. KMeans.fit() is used to find the centroids. In the classifier program also, we have to use the predict() method to assign every feature vector a discrete code.

Video Tutorials

HMM
GMM

Navigation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Speaker Recognition

Training

Model Building

Parameters

Implementation

Initial Parameters

Using Randomly Initialised Parameters:

To force a left to right model:

Fitting

Implementation

Concatenating all input features

Giving every input file separately (Multiple Observation Sequences)

Storing

Implementation

User Defined Class

Creating a Binary File

Pickling

Issues

HMM with Discrete Emissions

Video Tutorials

Clone this wiki locally