Feature Extraction for Speech Spoofing

Feature Extraction

This is a very important step as far as Voice Conversion (VC) is considered. This is because, in this case Speech Signal needs to are reconstructed from the obtained features. If the features do not represent the voice well, we may have loss of information. This was also one of the main problems faced. For our project, we take MCEP features. MCEP stands for Mel Cepstral Coefficients. (Not to be confused with MFCC).

For the sake of reconstructed speech to be intelligible, it is not enough to just take these features and combine them back. If only so much is done, the voice is robotic (pitch information is lost) or noisy (Due to Digital Filtering). So, it is necessary to use tools like STRAIGHT, which make it easier and don't compromise on quality.

Parameters

Frame Length: (sample_rate * 25) rounded to the closest power of 2. We chose 1024.
Frame Step: (sample_rate * 5) rounded to the closest power of 2. We chose 256.
Windowing: Tested with various Windows. Blackman produced best results and it is default as well. Hamming & Hann Windows are decent choices as well.
Order: Order refers to number of coefficients in the feature vector. Default value is 25. Value between 20 and 50 is recommended. We chose 25 as it produced relatively better reconstruction results for both female and male voices.
Alpha: For a sampling rate of 16KHz, alpha of 0.42 is suggested here.
Gamma: Zero by default. Values can be changed to get a decent reconstruction.

Implementation

In training phase, do the process for both source and target. We use librosa.util.frame to frame the signal and pysptk.blackman to window it. We use pysptk.mcep for feature extraction

frameLength = 1024
overlap = 0.25
hop_length = frameLength * overlap
order = 25
alpha = 0.42
gamma = -0.35

sourceframes = librosa.util.frame(speech, frame_length=frameLength, hop_length=hop_length).astype(np.float64).T
sourceframes *= pysptk.blackman(frameLength)
sourcemcepvectors = np.apply_along_axis(pysptk.mcep, 1, sourceframes, order, alpha)

Issues

MCEP vs MGCEP: Another important issue was to decide between MCEP or MGCEP features. Results were more or less the same in both cases. Hence we stuck with MCEP Features.

Navigation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Extraction for Speech Spoofing

Feature Extraction

Parameters

Implementation

Issues

Clone this wiki locally