Speeding up similarity calculations #126

vsraptor · 2021-06-18T14:33:10Z

Is there a systematic way to loop through all the SYNSETS i.e. synset iterator ?

Is it wn.synsets()

Any idea how can I speed up similarity calculations ?

I'm testing sentence comparisons. Just to give you an example comparing two words requires finding similarity of ~10-20 synsets, then if a sentence on avg has 10 words this means 100 word comparison per every two sentences ~1000 sims ... ~2s - 25sec ... then to compare ~1000++ sentences ... the numbers are enormous.. its should be ~1000++! but its not cause words repeat ..

I do caching of word2word sim, which helps, but any juice i can squeeze will be good

goodmami · 2021-06-20T13:48:38Z

Hi, I see you've already closed this before I could respond. Were you able to find a solution?

While I do put it a bit of effort to optimize the performance of parts of this codebase, in general I'm currently more concerned about correctness than performance. Also, before doing further optimizations we should setup some benchmarks (see #98).

For the similarity metrics, caching the results for word pairs is a good idea, but at that level (processing corpora) it seems more like a part of some research project or application and less like a feature to be added to Wn. However, all the similarity metrics use hypernym path calculations, and currently each hop of such a path requires one or more hits to the database, and I've thought about ways to speed this up (#38, #110).

You might follow the issues linked above if you're interested in performance. Also, I'm happy to receive pull requests :)

vsraptor changed the title ~~Speeding up similarity calculations~~ Synset iterator and Speeding up similarity calculations Jun 18, 2021

vsraptor changed the title ~~Synset iterator and Speeding up similarity calculations~~ Speeding up similarity calculations Jun 18, 2021

vsraptor closed this as completed Jun 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding up similarity calculations #126

Speeding up similarity calculations #126

vsraptor commented Jun 18, 2021 •

edited

Loading

goodmami commented Jun 20, 2021

Speeding up similarity calculations #126

Speeding up similarity calculations #126

Comments

vsraptor commented Jun 18, 2021 • edited Loading

goodmami commented Jun 20, 2021

vsraptor commented Jun 18, 2021 •

edited

Loading