Install: go get github.com/ekzhu/minhash-lsh
-
One set per line
-
Each set, all items are separated by whitespaces
-
If the parameter firstItemIsID is set to true, the first itme is the unique ID of the set.
-
The rest of the items with the following format:
<value>____<frequency>
- value is an unique element of the set
- frequency is an integer count of the occurance of value
____
(4 underscores) is the separator
minhash-lsh-all-pair -input <set file name>