-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Language Model Interface #210
Conversation
Now we have working MLE or Ngrambase model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small changes in code-style through-out the code. It will be great if you could go through the following suggestions, and fix all instances of those in this PR.
Examples - docstring, indentation, spaces before and after +,-,/,*,= etc. (except for space about '=' in args of function calling), overly specified types.
The code style looks good in general, especially the usage of type hierarchies for Langmodel. Once you are done with the entire PR and the following suggestions. I will do a more thorough review of this.
Once you are finished with the PR, try to incorporate above suggestions. Let me know once you are done. |
Thanks, @Ayushk4 |
Hi, Tejas. Thanks for incorporating the changes. |
Travis seems to fail on 1.3: ERROR: Ambiguous dependency on |
Co-authored-by: Ayush Kaushal <[email protected]>
updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall.
@aviks Can you have a look and suggestion changes? Maybe then we can have this merged? |
@aviks |
Taking a look now |
Implementation of Base Ngram Model. | ||
|
||
""" | ||
function MLE(word, unk_cutoff=1, unk_label="<unk>") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
word
needs a type tag here. Since other parameters of this function have default value, it can revert to a one argument form, that is then ambigous with the struct definiton. Is word
meant to be a vector? The docstring suggests so, and the function definition should be amended
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes word is Vector here
I will make a small PR with all word tags
In addition to initialization arguments from BaseNgramModel also requires | ||
a number by which to increase the counts, gamma. | ||
""" | ||
function Lidstone(word, gamma = 1.0, unk_cutoff=1, unk_label="<unk>") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same...
gamma::Float64 | ||
end | ||
|
||
function Laplace(word, unk_cutoff=1, unk_label="<unk>") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same ...
language model inspired by NLTK.lm package
Roadmap
vocabulary - Implemented contain vocabulary structure
Counter - contains language model counter
Structure and methods for counting ngrams and will count any ngram sequence you give it
models - contains Interpolated Language model
KneserNeyInterpolated ✔️
Laplace ✔️
Lidstone ✔️
MLE ✔️
WittenBellInterpolated ✔️
preprocessing - contains all function needed for preprocessing
smoothing - Smoothing algorithms for language modeling.
utility
documentation and test
it will also include