this repository is designed to be included as a submodule in other repositories
a python script that extracts the types in a text file and give them integer ids.
a python script that replaces each type in the input file to a unique integer id in the target file. another file is output which contains the id:type mappings.
inverse of encode-corpus.py.
a python script that filters out parallel sentences with number of tokens.
a python script that splits a parallel corpus into train/dev/test sets.
American vs. British English vocabulary collected from http://www.tysto.com/uk-us-spelling-list.html