Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Levenshtein distance algorithm #1431

Merged
merged 2 commits into from
Sep 4, 2024

Conversation

avazirna
Copy link
Contributor

@avazirna avazirna commented Sep 3, 2024

Product Description

This PR fixes a small issue with the Levenshtein distance calculation, which was causing the fuzzy search to return false positives. For instance, the difference between aply into cape was previously 2 while it should be 3, and because our distance threshold is set at 2, the cape was a valid value for aply.

Another look to the source of the algorithm, I could also confirm that this was initially incorrect:

		// initial cost of skipping prefix in String s1
		newcost[0]=j-1;

and later on updated to the current form:

		// initial cost of skipping prefix in String s1
		newcost[0]=j;

Ticket: https://dimagi.atlassian.net/browse/SAAS-15738

Technical Summary

Safety Assurance

Automated test coverage

There are unit tests around this feature and they seem to have not been affected by this change.

Special deploy instructions

  • This PR can be deployed after merge with no further considerations.

Rollback instructions

  • This PR can be reverted after deploy with no further considerations.

Review

  • The set of people pinged as reviewers is appropriate for the level of risk of the change.

Duplicate PR

Automatically duplicate this PR as defined in contributing.md.

@avazirna avazirna self-assigned this Sep 3, 2024
Copy link
Contributor

@shubham1g5 shubham1g5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me though can we write a unit test for the particular edge case we are trying to fix here.

@avazirna
Copy link
Contributor Author

avazirna commented Sep 4, 2024

Duplicate PR

@avazirna
Copy link
Contributor Author

avazirna commented Sep 4, 2024

duplicate this PR

@avazirna avazirna merged commit db0ffdd into master Sep 4, 2024
2 checks passed
@avazirna avazirna deleted the fix-levenshtein-distance-algorithm branch September 4, 2024 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants