README

Assignments of Coursera National Language Processing by Michael Collins Columbia University
----

H1: Hidden Markov Models
----
Instruction refer to h1/h1.pdf

hmm.py
    Hmm_ex, extending Hmm, calculates and stores:
        * e(x|y), 
        * q(y_i|y_i-1, y_i-2)
        * count(x), 
        * rare_word, 
        * all tags 
        * all words
    SimpleTagger does simple tagging as instructed by Part 1
    ViterbiTagger does Viterbi tagging as instructed by Part 2    
p1.py
    Part 1
p2.py
    Part 2
p3.py
    Part 3
    not as good as required: Your F1-Score is 35.009 and the goal F1-Score is 39.519.
util.py
    Helper methods including
        * handling rare word (applying different rules)
        * test data iterator

----

H2: Probabilistic Context-Free Grammar (PCFG)
----
Instruction refer to h2/h2.pdf

pcfg.py
    PCFG, extending Count, calculate and store
        * q(X->Y1Y2)
        * q(X->w)
    CKYTagger implements CKY algorithm
p1.py
    Part 1
p2.py
    Part 2
    Expected development total F1-Scores are 0.79 for part 2 and 0.83 for part 3. 
p3.py
    Part 3

----
H3: IBM Model 1 & 2
----
Instruction refer to h3/h3.pdf

ibmmodel.py
    Count
        * t(f|e)
    IBMModel1, implements EM and align algorithm


p1.py
    Part 1

The expected development F-Scores are 0.420, 0.449, and a basic intersection alignment should give 0.485 for the last part.

----