Just playing around with strings... It started as a collection of string algorithms implemented in order to learn a bit more about them.
Making it a package so people can import if they find useful.
An implementation of Levenshtein
distance algorithm with a few algorithms and swift optimization.
Step by step:
-
An optimization by trimming equal prefixes and suffixes from source and destination and since it doesn't affect the result and we can reduce the cost. Example
"abcd" -> "auid"
=="bc" -> "ui"
. -
Early exit for empty cases.
-
Allocate two rows (current and previous) which at the initial stage of the algorithm is first and second row initialized accordingly the first row is
0...destination.count
and second row is1
at initial position and all zeros. Note that the original algorithm pseudo code describes a matrixsource.count
xdestination.count
but since through the algorithm iterations it only operates in terms of the current and the previous rows a size allocation optimization can be done. -
Iterate through "all the rows" calculating the minimum of insertion, deletion and substitution + 1. Pseudo code:
if s[i] = t[j]:
substitutionCost := 0
else:
substitutionCost := 1
d[i, j] := minimum(d[i-1, j] + 1, // deletion
d[i, j-1] + 1, // insertion
d[i-1, j-1] + substitutionCost) // substitution
Given that we don't have the whole matrix, the iteration is done by at the end of operating the current row we swap the previous and current so the previous row becomes the current a we reuse the area which was the previous(that will be discarded) for operate the new current initializing the first position of the current accordingly.
-
After operate in all rows we get the distance.
-
Reference:
- The Hamming distance between two strings of equal length is the count of positions in which the caracter is different in the between the two.
- Implement other algorithms e.g. Damerau–Levenshtein(differs from the Levenshtein by including also transponsitions on the calculation in addition to the insertion, substitution and deletion).
Is very simple to use...
import strings
"friend".levenshteinDistance(to: "fresh") // 3
"adlsajdlsa".jaroDistance(to: "asv") // ~= 0.6222
"friend".hammingDistance(to: "frinnd") // 1
You can use The Swift Package Manager to install by adding the proper description to your Package.swift
import PackageDescription
let package = Package(
name: "YOUR_PROJECT_NAME",
targets: [],
dependencies: [
.package(url: "https://github.com/LucianoPAlmeida/strings.git", from: "0.0.1")
]
)
Then add the target as dependency
.target(
name: "YOUR_TARGET_NAME",
dependencies: [
"strings", ...
]
),
Or you can manually just copy the implementations into your project.
Benchmark was there to make us strive to the most performant implementation given the knowledge of the algorithms and of the swift language.
- .NET Conf 2020 Maximising Algorithm Performance in .NET: Levenshtein Distance by James Turner
- https://en.wikipedia.org/wiki/Levenshtein_distance
- https://www.geeksforgeeks.org/jaro-and-jaro-winkler-similarity/
strings is released under the MIT License.