Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Levenshtein distance functions to string lib #3648

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Beherith
Copy link
Collaborator

--- Calculates edit distance of two strings, O(n*m) time, O(n) memory
	---@param a string
	---@param b string
	---@return number 

	function string.Levenshtein(a,b)
	--- Finds string that is closest to a in a table
	---@param a string
	---@param t table, primarily values are strings, keys can be strings too
	---@return string, number bestresult, bestscore
	function string.FindClosest(a,t)

Copy link
Collaborator

@salinecitrine salinecitrine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use case for this? Probably worth mentioning in the PR description.

@@ -182,3 +182,66 @@ if not string.formatSI then
return str .. siPrefix
end
end

if not string.Levenshtein then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to match other functions, and to make it a little more clear what it does

Suggested change
if not string.Levenshtein then
if not string.levenshteinDistance then

(and changes on related lines for this and FindClosest)

function string.Levenshtein(a,b)
local lena = string.len(a)
local lenb = string.len(b)
local ssub = string.sub
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already defined above

Suggested change
local ssub = string.sub

Comment on lines +244 to +246

-- print(string.Levenshtein(string.rep("asdfasdfasdfasdf", 1000), string.rep("asdfasdfasdfasda", 1000))) -- 5 seconds
-- print(string.FindClosest("apple", {"pear", "popple","bear","ple"}))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- print(string.Levenshtein(string.rep("asdfasdfasdfasdf", 1000), string.rep("asdfasdfasdfasda", 1000))) -- 5 seconds
-- print(string.FindClosest("apple", {"pear", "popple","bear","ple"}))

Levenshtein0[c+1] = c
end
for r = 1, lena do
--print(table.unpack(Levenshtein0))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--print(table.unpack(Levenshtein0))

end
Levenshtein0, Levenshtein1 = Levenshtein1, Levenshtein0 -- swap rows
end
return Levenshtein1[lenb]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function gives incorrect output for at least these test cases:

{
	name = "returns the length of the non-empty string for one empty string",
	args = { "", "apple" },
	expected = 5,
},
{
	name = "returns the length of the non-empty string for one empty string (reverse case)",
	args = { "apple", "" },
	expected = 5,
},
{
	name = "returns 1 for strings differing by one character",
	args = { "apple", "appla" },
	expected = 1,
},
{
	name = "returns the length of the longer string for completely different strings",
	args = { "cat", "dogs" },
	expected = 4,
},

This change fixes those tests:

Suggested change
return Levenshtein1[lenb]
return Levenshtein0[lenb + 1]

@WatchTheFort
Copy link
Member

@Beherith What is the status of this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants