Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ex. 12.5 #155

Closed
szcf-weiya opened this issue Aug 6, 2018 · 3 comments
Closed

Ex. 12.5 #155

szcf-weiya opened this issue Aug 6, 2018 · 3 comments

Comments

@szcf-weiya
Copy link
Owner

12-5

@szcf-weiya szcf-weiya added this to the Solutions 12 milestone Aug 6, 2018
@szcf-weiya
Copy link
Owner Author

szcf-weiya commented May 27, 2020

(a)

function plot_all_curves()
# https://github.com/JuliaGraphics/Colors.jl/blob/master/src/names_data.jl
# colors = ["blue", "red", "orange", "violet", "torquoise"]
ycolors = fill("blue", (1, size(y, 1))) # use row vector!!
for i = 1:size(y, 1)
if y[i] == "aa"
continue
elseif y[i] == "ao"
ycolors[i] = "red"
elseif y[i] == "dcl"
ycolors[i] = "orange"
elseif y[i] == "iy"
ycolors[i] = "violet"
elseif y[i] == "sh"
ycolors[i] = "turquoise"
end
end
p0 = plot(X', linecolors = ycolors, legend = false, title = "mix")
p1 = plot(X[y .== "aa", :]', lc = "blue", legend = false, title ="aa")
p2 = plot(X[y .== "ao", :]', lc = "red", legend = false, title = "ao")
p3 = plot(X[y .== "dcl", :]', lc = "orange", legend = false, title = "dcl")
p4 = plot(X[y .== "iy", :]', lc = "violet", legend = false, title = "iy")
p5 = plot(X[y .== "sh", :]', lc = "turquoise", legend = false, title = "sh")
return savefig(plot(p0, p1, p2, p3, p4, p5), "all_curves.png")
end

all_curves

@szcf-weiya
Copy link
Owner Author

(b)

PNG image

@szcf-weiya
Copy link
Owner Author

szcf-weiya commented May 27, 2020

(c)

I think the reason for putting all phonemes of a speaker into either train set or test set is to keep the test set independent from the train set, otherwise, the final accuracy tends to be higher since we have known the information of the speaker through some partial phonemes in the train set before performing testing on the test set.

ESL-CN/code/Ex.12.5/main.jl

Lines 131 to 156 in 3a6336b

function evaluate()
# divide into train and test set
idx_train = Bool[ifelse(occursin("train", x), 1, 0) for x in data[:,end]]
Xtrain, ytrain = X[idx_train,:], y[idx_train]
Xtest, ytest = X[.~idx_train, :], y[.~idx_train]
accs = zeros(3, 4)
bestacc = 0.0
bestB = nothing
bestθ = nothing
for (i, J) in enumerate([5, 10, 15])
B = construct_B(256, J)
for (j, K) in enumerate([1, 3, 5, 7])
println("J = $J, K = $K")
Cs, θs = kmeans(Xtrain, ytrain, B, K)
ypred = classify(θs, Xtest, B)
accs[i, j] = sum(ytest .== ypred) / length(ytest)
println(freqtable(ytest, ypred))
if bestacc < accs[i, j]
bestacc = accs[i, j]
bestB = B
bestθ = θs
end
end
end
return accs, bestB, bestθ
end

The accuracy is as follows:

julia> accs
3×4 Array{Float64,2}:
 0.811805  0.835757  0.851155  0.842601
 0.867408  0.875962  0.88195   0.890505
 0.867408  0.886228  0.885372  0.893926

and comparison between some contingency tables,

J = 5, K = 1
5×5 Named Array{Int64,2}
Dim1 ╲ Dim2 │  aa   ao  dcl   iy   sh
────────────┼────────────────────────
aa          │ 112   64    0    0    0
ao          │  68  186    5    4    0
dcl         │   0    4  185    6    0
iy          │  11   39    4  244   13
sh          │   0    0    0    2  222

J = 15, K = 7
5×5 Named Array{Int64,2}
Dim1 ╲ Dim2 │  aa   ao  dcl   iy   sh
────────────┼────────────────────────
aa          │ 130   46    0    0    0
ao          │  64  198    1    0    0
dcl         │   0    0  192    3    0
iy          │   1    0    8  301    1
sh          │   0    0    0    0  224

And I also plot the (smooth) prototypes,
all_B_curves
which can be treated as the extracted features from the original data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant