-
Notifications
You must be signed in to change notification settings - Fork 23
/
README
65 lines (54 loc) · 1.36 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Assignments of Coursera National Language Processing by Michael Collins Columbia University
----
H1: Hidden Markov Models
----
Instruction refer to h1/h1.pdf
hmm.py
Hmm_ex, extending Hmm, calculates and stores:
* e(x|y),
* q(y_i|y_i-1, y_i-2)
* count(x),
* rare_word,
* all tags
* all words
SimpleTagger does simple tagging as instructed by Part 1
ViterbiTagger does Viterbi tagging as instructed by Part 2
p1.py
Part 1
p2.py
Part 2
p3.py
Part 3
not as good as required: Your F1-Score is 35.009 and the goal F1-Score is 39.519.
util.py
Helper methods including
* handling rare word (applying different rules)
* test data iterator
----
H2: Probabilistic Context-Free Grammar (PCFG)
----
Instruction refer to h2/h2.pdf
pcfg.py
PCFG, extending Count, calculate and store
* q(X->Y1Y2)
* q(X->w)
CKYTagger implements CKY algorithm
p1.py
Part 1
p2.py
Part 2
Expected development total F1-Scores are 0.79 for part 2 and 0.83 for part 3.
p3.py
Part 3
----
H3: IBM Model 1 & 2
----
Instruction refer to h3/h3.pdf
ibmmodel.py
Count
* t(f|e)
IBMModel1, implements EM and align algorithm
p1.py
Part 1
The expected development F-Scores are 0.420, 0.449, and a basic intersection alignment should give 0.485 for the last part.
----