Skip to content

Tagger program that predicts the Part-of-Speech tags for untagged text by inference using a Hidden Markov Model

Notifications You must be signed in to change notification settings

YibinLong/Hidden-Markov-Model-Part-of-Speech-Tagger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Hidden Markov Model Part of Speech Tagger

This Part-of-Speech tagger predicts the Part-of-Speech tags for untagged text by inference through a Hidden Markov Model.

The part-of-speech of a word or punctuation mark tells us its function in a sentence. For example, a noun is a person, place, or thing. A conjunction joins two clauses together, like "and" or "but".

The tagger processes training text files that have been tagged with the correct part-of-speech for each word. It then creates the initial, transition and observation probability tables in the Hidden Markov Model. After this, it uses the Viterbi algorithm to predict the part-of-speech tags for untagged text files. Given sufficiently large training and test sets (40,000+ characters), the tagger can achieve an accuracy of over 80%.

Usage

  1. Clone the repository and navigate to the directory
  2. Use the following command to run the program, specifying the necessary input files and the output file:

python tagger.py --trainingfiles <training files> --testfile <test file> --outputfile <output file>

  • Replace <training files> with one or more training text files separated by spaces.
  • Replace <test file> with the test text file you want to evaluate.
  • Replace <output file> with the name of the output text file where the program will write the predicted POS tags.

Example

Let's say we have two training files named train1.txt and train2.txt (both labeled with POS tags), and we want to test our program on test3.txt (not labeled with POS tags), generating an output file named output.txt. We would run the following command:

python tagger.py --trainingfiles train1.txt train2.txt --testfile test3.txt --outputfile output.txt

About

Tagger program that predicts the Part-of-Speech tags for untagged text by inference using a Hidden Markov Model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages