Ex. 12.5 #155

szcf-weiya · 2018-08-06T11:11:19Z

szcf-weiya · 2020-05-27T11:34:09Z

(a)

ESL-CN/code/Ex.12.5/main.jl

Lines 8 to 32 in 3a6336b

    
           function plot_all_curves() 
        
               # https://github.com/JuliaGraphics/Colors.jl/blob/master/src/names_data.jl 
        
               # colors = ["blue", "red", "orange", "violet", "torquoise"] 
        
               ycolors = fill("blue", (1, size(y, 1))) # use row vector!! 
        
               for i = 1:size(y, 1) 
        
                   if y[i] == "aa" 
        
                       continue 
        
                   elseif y[i] == "ao" 
        
                       ycolors[i] = "red" 
        
                   elseif y[i] == "dcl" 
        
                       ycolors[i] = "orange" 
        
                   elseif y[i] == "iy" 
        
                       ycolors[i] = "violet" 
        
                   elseif y[i] == "sh" 
        
                       ycolors[i] = "turquoise" 
        
                   end 
        
               end 
        
               p0 = plot(X', linecolors = ycolors, legend = false, title = "mix") 
        
               p1 = plot(X[y .== "aa", :]', lc = "blue", legend = false, title ="aa") 
        
               p2 = plot(X[y .== "ao", :]', lc = "red", legend = false, title = "ao") 
        
               p3 = plot(X[y .== "dcl", :]', lc = "orange", legend = false, title = "dcl") 
        
               p4 = plot(X[y .== "iy", :]', lc = "violet", legend = false, title = "iy") 
        
               p5 = plot(X[y .== "sh", :]', lc = "turquoise", legend = false, title = "sh") 
        
               return savefig(plot(p0, p1, p2, p3, p4, p5), "all_curves.png") 
        
           end

szcf-weiya · 2020-05-27T11:41:23Z

(b)

szcf-weiya · 2020-05-27T11:53:13Z

(c)

I think the reason for putting all phonemes of a speaker into either train set or test set is to keep the test set independent from the train set, otherwise, the final accuracy tends to be higher since we have known the information of the speaker through some partial phonemes in the train set before performing testing on the test set.

ESL-CN/code/Ex.12.5/main.jl

Lines 131 to 156 in 3a6336b

    
           function evaluate() 
        
               # divide into train and test set 
        
               idx_train = Bool[ifelse(occursin("train", x), 1, 0) for x in data[:,end]] 
        
               Xtrain, ytrain = X[idx_train,:], y[idx_train] 
        
               Xtest, ytest = X[.~idx_train, :], y[.~idx_train] 
        
               accs = zeros(3, 4) 
        
               bestacc = 0.0 
        
               bestB = nothing 
        
               bestθ = nothing 
        
               for (i, J) in enumerate([5, 10, 15]) 
        
                   B = construct_B(256, J) 
        
                   for (j, K) in enumerate([1, 3, 5, 7]) 
        
                       println("J = $J, K = $K") 
        
                       Cs, θs = kmeans(Xtrain, ytrain, B, K) 
        
                       ypred = classify(θs, Xtest, B) 
        
                       accs[i, j] = sum(ytest .== ypred) / length(ytest) 
        
                       println(freqtable(ytest, ypred)) 
        
                       if bestacc < accs[i, j] 
        
                           bestacc = accs[i, j] 
        
                           bestB = B 
        
                           bestθ = θs 
        
                       end 
        
                   end 
        
               end 
        
               return accs, bestB, bestθ 
        
           end

The accuracy is as follows:

julia> accs
3×4 Array{Float64,2}:
 0.811805  0.835757  0.851155  0.842601
 0.867408  0.875962  0.88195   0.890505
 0.867408  0.886228  0.885372  0.893926

and comparison between some contingency tables,

J = 5, K = 1
5×5 Named Array{Int64,2}
Dim1 ╲ Dim2 │  aa   ao  dcl   iy   sh
────────────┼────────────────────────
aa          │ 112   64    0    0    0
ao          │  68  186    5    4    0
dcl         │   0    4  185    6    0
iy          │  11   39    4  244   13
sh          │   0    0    0    2  222

J = 15, K = 7
5×5 Named Array{Int64,2}
Dim1 ╲ Dim2 │  aa   ao  dcl   iy   sh
────────────┼────────────────────────
aa          │ 130   46    0    0    0
ao          │  64  198    1    0    0
dcl         │   0    0  192    3    0
iy          │   1    0    8  301    1
sh          │   0    0    0    0  224

And I also plot the (smooth) prototypes,

which can be treated as the extracted features from the original data.

szcf-weiya added the exercise label Aug 6, 2018

szcf-weiya added this to the Solutions 12 milestone Aug 6, 2018

szcf-weiya closed this as completed in 3a6336b May 27, 2020

szcf-weiya added the solved label May 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ex. 12.5 #155

Ex. 12.5 #155

szcf-weiya commented Aug 6, 2018

szcf-weiya commented May 27, 2020 •

edited

Loading

szcf-weiya commented May 27, 2020

szcf-weiya commented May 27, 2020 •

edited

Loading

Ex. 12.5 #155

Ex. 12.5 #155

Comments

szcf-weiya commented Aug 6, 2018

szcf-weiya commented May 27, 2020 • edited Loading

(a)

szcf-weiya commented May 27, 2020

(b)

szcf-weiya commented May 27, 2020 • edited Loading

(c)

szcf-weiya commented May 27, 2020 •

edited

Loading

szcf-weiya commented May 27, 2020 •

edited

Loading