should we benchmark containment rather than similarity? #2

ctb · 2020-10-08T14:31:53Z

In Results section Scaled MinHash sketches support efficient indexing for large-scale containment queries, tbl:search-runtime shows runtime for similarity search. Two thoughts --

first, these are surprisingly slow :(.
second, these are for similarity, not containment.

My experience with containment and gather (which uses containment) is that these are pretty fast operations; I rather rarely use similarity. Moreover, the whole paper is more focused on containment than similarity anyway.

Should we refocus this benchmark on containment?

ctb · 2020-10-28T17:29:58Z

yes, I think we should. :)

ctb · 2020-12-02T15:30:42Z

given the stuff going on with greyhound, we are going to ignore performance in this paper (beyond implying that it's acceptable, 'cause here are the results).

ctb · 2020-12-02T15:31:09Z

(and in fact we are removing that entire section as part of shift to #10)

ctb closed this as completed Dec 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

should we benchmark containment rather than similarity? #2

should we benchmark containment rather than similarity? #2

ctb commented Oct 8, 2020 •

edited

Loading

ctb commented Oct 28, 2020

ctb commented Dec 2, 2020

ctb commented Dec 2, 2020

should we benchmark containment rather than similarity? #2

should we benchmark containment rather than similarity? #2

Comments

ctb commented Oct 8, 2020 • edited Loading

ctb commented Oct 28, 2020

ctb commented Dec 2, 2020

ctb commented Dec 2, 2020

ctb commented Oct 8, 2020 •

edited

Loading