Skip to content

Latest commit

 

History

History

fts_fuzzy_match

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

FTS Fuzzy Match

The original fts_fuzzy_match algorithms by Forrest Smith, organized by version number, along with third parties ports to other languages.


Table of Contents


Directory Contents

About FTS Fuzzy Match

In March 2016, Forrest Smith published the Reverse Engineering Sublime Text’s Fuzzy Match article where he provided a detailed description on how he managed to create an algorithm that emulated Sublime Text fuzzy search functionality.

He generously made his fts_fuzzy_match algorithm available to the general public by publishing its code on GitHub, both in C++ and JavaScript. Since the publication of the article and fts_fuzzy_match, there have been numerous users feedback that lead to various updates and improvements to the original algorithm, which is currently circulating in two distinct versions: v0.1.0, and v0.2.0.

The importance of Forrest algorithm lies in the fact that it was written as a proof of concept for educational purposes. This means that both his article and code are easily accessible to anyone, the code is well commented and neither require academical understanding of the subject. As a result, there are many areas in which the algorithm might (and can) be improved, but it's a precious starting point for anyone wishing to approach the subject.

Fuzzy matching, and its application to fuzzy searching, is a wide and complex subject area where many different approaches and algorithms are available — but usually presented as academic papers that are inaccessible to the wider audience. To Forrest Smith goes merit of having opened up the subject to public discussion, for before his 2016 article there weren't many non-specialistic resources on this topic — at least, none that I managed to find via extensive googling. Definitely, both his article and fts_fuzzy_match code have been game changers in this respect, and had ripple effect in the open source community, especially on GitHub.

Since there have been ports of both versions of the algorithm to other languages, I'll now summarize its history and the main differences between v0.1.0 and v0.2.0, to provide a better picture of its evolution and diffusion.

The Original Algorithm from 2016

fts_fuzzy_match is an algorithm for "fuzzy string matching inspired by Sublime Text" written by Forrest Smith and published in 2016 as part of his lib_fts project of "single-file public domain libraries" (where "FTS" stands for Forrest Thomas Smith):

The latest version of the original algorithms v0.1.0 implementations, by Forrest, can be found here:

Although there have been various updates and tweaks to v0.1.0 algorithms, the bump to v0.2.0 introduced a significant quality improvement to its functionality — as well as some complexity to its code, and a slight performance overhead. Those wishing to study the algorithms, should start by looking at the code of v0.1.0 first, and then move on to study v0.2.0, to better understand its core features via the former, and appreciate the improvements introduced by the latter.

The 2017 Update

In February 2017 — i.e. a year after the original article — Sublime Text author Jon Skinner posted on reddit a comment on Forrest article, confirming the soundness of the overall approach and also providing insights on how to improve the algorithm:

jskinner: Sublime Text author here. Nice writeup! The actual algorithm that Sublime Text uses is similar in principle, but the implementation is rather different in the name of speed.

There are a couple of things you may want to tweak for better quality matching: you likely want to search more exhaustively for a better match, e.g., searching for "lll" in the UE4 filename list matches "SVisualLoggerLogsList.h", but not as well as it should, as it doesn't pickup all the upper case Ls. If you don't mind sacrificing a little speed, then it's also nice to handle transpositions, so that "sta" can still match against "SpawnActorTimer", just with a lower score - Sublime Text 3 does this.

Skinner's feedback was a major breakthrough that prompted Forrest to update the fts_fuzzy_match C++ algorithm to v0.2.0, in order to include the "exhaustive search" suggested by Skinner:

It's worth noting that Forrest never updated his JavaScript implementation to v0.2.0 (@nrgwsth did, in PR #10), and that many of the third party ports of fts_fuzzy_match to other languages are still based on v0.1.0 of the algorithm.

For this reason, I decided to separate all implementations of fts_fuzzy_match into different folders, according to their algorithm version, to avoid confusion and simplify things to anyone wishing to study the sources or port them to other languages.

License

All source code from Forrest Smith's lib_fts project is released under the following license:

This software is dual-licensed to the public domain and under the following license: you are granted a perpetual, irrevocable license to copy, modify, publish, and distribute this file as you see fit.

The source files also embed the above license in their comments.

The lib_fts repository doesn't explicitly mention if this license also applies to documentation files; so I can't vouch that the fuzzy_match.md document is also into the public domain. I've nevertheless decided to include it (as is) in this project for completion and fairness sake, and to shield the original author from any inaccuracies I might have introduced in my own documentation of his algorithms and projects.

FTS Fuzzy Match Implementations

Forrest's algorithm has inspired a number of third party ports to other languages. The following table lists all the implementations available in this repository, of both versions of the algorithm, sorted by language.

language ver test author src link license
C 0.1.0 ✔️ Philip Jones PR #23 public domain
C 0.2.0 ✔️ Philip Jones PR #24 MIT
C# 0.1.0 n/a Collin Dillinger Gist public domain
C# 0.2.0 n/a @theor PR #13 CC0 1.0
C++ 0.1.0 🎯 Forrest Smith GitHub public domain
C++ 0.2.0 🎯 Forrest Smith GitHub public domain
Delphi/FreePascal 0.2.0 ✔️ j.visser Gist CC0 1.0
Elixir 0.2.0 n/a @WolfDan Gist MIT
F# 0.1.0 n/a Xavier Zwirtz GitHub MIT
JavaScript 0.1.0 ✔️ Forrest Smith GitHub public domain
JavaScript 0.2.0 ✔️ @nrgwsth PR #10 public domain
Kotlin 0.2.0 ✔️ Michael Bikovitsky PR #20 CC0 1.0
Lua 0.1.0 n/a Blake Mealey GitHub public domain
PHP 0.1.0 n/a @detectiveYarmas GitHub CC0 1.0
PureBasic 0.1.0 ✔️ Tristano Ajmone n/a CC0 1.0
Python 0.1.0 n/a Matt Menzenski Gist MIT
Zig 0.1.0 ✔️ Tristano Ajmone n/a CC0 1.0

The test column indicates whether a given implementation of the algorithm has been tested against the original C++ target 🎯 implementation of the same version for identical behaviour. A ✔️ indicates that the test succeeds, a ❌ that it fails, and n/a that currently there is no test code for that implementation (see Issue #17).

HELP NEEDED! — I need help to create the missing tests for the various languages ports (see Issue #17).

I've started out by including in this project the code of all the licensed ports that I've come across, and contacting the authors of those ports that don't have a clear or explicit license to ask them permission to include them here.

If you know of any other fts_fuzzy_match implementation which is not listed here, please open an Issue and let me know.

DISCLAIMER — I haven't had a chance to fully test all these ports, so I cannot guarantee for their quality and accuracy, not until we'll have a test in place for each implementation. But I've studied their source enough to establish that they are indeed implementation of the algorithm, and not malicious code.

For guidelines on how to port and test the algorithms to a new language, see the PORTING.md document.

FTS Fuzzy Match Derivatives

Here are some links to adaptations of Forrest's algorithm.

language version author link license notes
C++ 0.2.0 Martin Budden GitHub » SuperSlicer public domain See rework notes
C++ Qt 0.2.0 Waqar Ahmed GitHub » Kate LGPLv2+ No longer used in Kate now
Go 0.1.0 Sahil Muthoo GitHub » fuzzy MIT

Due to the presence of custom tweaks, they diverge from the original algorithm and don't qualify as fts_fuzzy_match ports (although they might still closely resemble it).

FTS Fuzzy Match Wrappers

Here's a list of wrappers to fts_fuzzy_match.h in various languages.

name language author license
nimfuzzy Nim @genotrance MIT