Skip to content

CompareString2

Francesco Andreuzzi edited this page Feb 14, 2019 · 6 revisions

T-UI 6.12 introduced CompareString2, a Java library which allows to perform String comparison with various algorithms.

Algorithms

In order to change the algorithm used to compare your Stings, you need to set the option suggestions_algorithm in suggestions.xml.

Choose an ID from the table below:

Category Algorithm ID
Distance LCS 0
Distance OSA 1
Distance QGRAM 2
Normalized distance COSINE 4
Normalized distance JACCARD 5
Normalized distance JAROWRINKLER 6
Normalized distance METRICLCS 7
Normalized distance NGRAM 8
Normalized distance NLEVENSHTEIN 9
Normalized distance SORENSENDICE 10
Normalized similarity COSINE 11
Normalized similarity JACCARD 12
Normalized similarity JAROWRINKLER 13
Normalized similarity NLEVENSHTEIN 14
Normalized similarity SORENSENDICE 15
Metric distance DAMERAU 16
Metric distance JACCARD 17
Metric distance LEVENSHTEIN 18
Metric distance METRICLCS 19

Then use the following command:

config -set suggestions_algorithm ID

For instance, if you want to use the normalized-distance version of JACCARD, you will use the command:

config -set suggestions_algorithm 5


You can get more info about the available algorithms here and here.

Deadline

Result ranges

Category Equals Different
Distance 0 +Infinity
Normalized distance 0 1
Normalized similarity 1 0
Metric distance 0 +Infinity

As you can see, there are some cases when an higher result means that the Strings are "more equal" (i.e. normalized-similarity) whereas the situation is opposite in other cases (i.e. distance, normalized-distance, metric-distance).

Keep the table above in mind when you set your deadline, with the command

config -set suggestions_deadline [deadline]

Check here and here for more details.

Clone this wiki locally