rust-autocomplete
is a rudimentary word completion library built in
rust. It is inspired by Rodrigo
Palacios's ELI5 explanation of an
autocompletion AI found here.
I figured it would be a good way to continue learning Rust.
Using SimpleWordPredictor
is easy. First it must be trained.
Autocomplete will work better with a larger corpus of training data available to it.
In this repository are provided two types of training data. The first is the file
named training_data.csv
. This is an already processed count of a large amount of
input text. The other is the file named big.txt
which is provided by [Peter Norvig]
(http://norvig.com). This is a raw collection of several books.
I also recommend watching Peter Norvig's lecture titled The Unreasonable Effectiveness of Data.
If you use the provided training data found in training_data.csv
,
you only have to call SimpleWordPredictor::from_file()
with a path
to the training data.
This method requires you to do a bit more heavy lifting. You will need open your corpus
and make sure the only characters are the ones that you want in your training data. I
settled on the characters [a-z]
and spaces. You then feed this data into SimpleWordTrainer
using its train_str()
method. Before you can predict, SimpleWordTrainer
must be converted
to SimpleWordPredictor
, which changes its internal representation of the training data.
This is how I trained autocomplete to create training_data.csv
:
fn clean_line(line: String) -> String {
let mut new_string = String::new();
let line_bytes = line.bytes();
for byte in line_bytes {
if byte == 32 {
new_string.push(from_u32(byte as u32).unwrap());
} else if byte >= 97 && byte <= 122 {
new_string.push(from_u32(byte as u32).unwrap());
} else if byte >= 64 && byte <= 90 {
new_string.push(from_u32((byte + 32) as u32).unwrap());
}
}
new_string
}
fn train_model(model: &mut SimpleWordTrainer, path: Path) {
let mut file = BufferedReader::new(File::open(&path));
for line in file.lines() {
let cleaned_line = clean_line(line.unwrap());
model.train_str(cleaned_line.as_slice());
}
}
fn main() {
let mut model = SimpleWordTrainer::new();
let file_path = Path::new("big.txt");
println!("Training.");
train_model(&mut model, file_path);
println!("Finalizing.");
let predictor = model.finalize();
// Save predictor here or use it to run predictions
}
Call SimpleWordPredictor.predict()
with a &str
to get back a vec<PredictionEntry>
.
PredictionEntry
has public fields score
and word
.
loop {
print!("Input: ");
let input = old_io::stdin().read_line().ok().expect("Failed to read line.");
let output = predictor.predict(input.trim());
println!("Score\tWord");
for entry in output {
println!("{}\t{}", entry.score, entry.word);
}
}