Available as a nuget package SoftWx.Match
This project started as a place to put code related to a series of blog posts I made on optimizing the Levenshtein and Damerau-Levenshtein algorithms:
Optimizing Levenshtein Algorithm in C#
Optimizing Damerau-Levenshtein in C#
Beginning with version 2, I'll be fleshing out the library with additional functionality. Most of this comes out of work I've done over the past several years working on high confidence matching of person data. A primary goal of the library is to provide methods that are optimized for speed and memory efficiency. To the best of my knowledge, the Levenshtein and Damerau-Levenshtein function implementations in this libary are among the faster and more memory efficient ones available, and for non-trivial strings, possibly the fastest.
In many cases, functionality is exposed as methods in instantiable classes, and also as static class methods.
Immediate future plans are first to add similarity functions for Levenshtein and Damerau-Levenshtein (done). After that, I'll be adding some functionality specifically for person name matching, including a novel phonetic algorithm.
This includes classes that implement an IDistance interface by exposing methods for computing the distance between two strings.
This class provides methods that compute the Levenshtein edit distance.
// instantiated class version
var ed = new Levenshtein();
var dist = ed.Distance("flintstone", "hanson");
// you can also specify the max distance you care about, which can result in significant speed improvement
var dist = ed.Distance("flintstone", "hanson", 2);
// static class version
var dist = Distance.Levenshtein("flintstone", "hanson");
dist = Distance.Levenshtein("flintstone", "hanson", 2);
This class provides methods that compute the Damerau-Levenshtein optimal string alignment (OSA) edit distance.
// instantiated class version
var ed = new DamerauOSA();
var dist = ed.Distance("flintstone", "hanson");
// you can also specify the max distance you care about, which can result in significant speed improvement
var dist = ed.Distance("flintstone", "hanson", 2);
// static class version
var dist = Distance.DamerauOSA("flintstone", "hanson");
dist = Distance.DamerauOSA("flintstone", "hanson", 2);
This includes classes that implement an ISimilarity interface by exposing methods for computing a normalized measure of the similarity of two strings. Similarity results are a double value from 0 to 1.0, where 0 is no similarity at all, and 1.0 is equivalent strings.
This class provides methods that compute a normalized similarity value based on the Levenshtein edit distance between two strings. The similarity formula is 1 - (edit distance / length of longer string).
// instantiated class version
var ed = new Levenshtein();
var sim = ed.Similarity("flintstone", "hanson");
// you can also specify the min similarity you care about, which can result in significant speed improvement
var sim = ed.Similarity("flintstone", "hanson", .75);
// static class version
var sim = Similarity.Levenshtein("flintstone", "hanson");
sim = Similarity.Levenshtein("flintstone", "hanson", .75);
This class provides methods that compute a normalized similarity value based on the Damerau-Levenshtein optimal string alignment (OSA) edit distance between two strings. The similarity formula is 1 - (edit distance / length of longer string).
// instantiated class version
var ed = new DamerauOSA();
var sim = ed.Similarity("flintstone", "hanson");
// you can also specify the min similarity you care about, which can result in significant speed improvement
var sim = ed.Similarity("flintstone", "hanson", .75);
// static class version
var sim = Similarity.DamerauOSA("flintstone", "hanson");
sim = Similarity.DamerauOSA("flintstone", "hanson", .75);
v2.0.3 Added DamerauOSA similarity functions, minor speed improvements to Levenshtein distance functions
v2.0.2 Corrected some comments, added Levenshtein similarity functions, added xml docs
v2.0.1 Fixed error in README, Fixed bug in DEBUG conditional code, added detail to nuspec file
v2.0.0 Restructured classes to better accomodate future additions
v1.1.0 Added maxDistance version of Levenshtein, added instantiable class versions, added unit tests, added benchmark project, moved to .Net Standard 1.0 libary project, added nuget package creation
v1.0.0 initial version containing basic Levenshtein and Damerau-Levenshtein functions from my blog posts