You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there a systematic way to loop through all the SYNSETS i.e. synset iterator ?
Is it wn.synsets()
Any idea how can I speed up similarity calculations ?
I'm testing sentence comparisons. Just to give you an example comparing two words requires finding similarity of ~10-20 synsets, then if a sentence on avg has 10 words this means 100 word comparison per every two sentences ~1000 sims ... ~2s - 25sec ... then to compare ~1000++ sentences ... the numbers are enormous.. its should be ~1000++! but its not cause words repeat ..
I do caching of word2word sim, which helps, but any juice i can squeeze will be good
The text was updated successfully, but these errors were encountered:
vsraptor
changed the title
Speeding up similarity calculations
Synset iterator and Speeding up similarity calculations
Jun 18, 2021
vsraptor
changed the title
Synset iterator and Speeding up similarity calculations
Speeding up similarity calculations
Jun 18, 2021
Hi, I see you've already closed this before I could respond. Were you able to find a solution?
While I do put it a bit of effort to optimize the performance of parts of this codebase, in general I'm currently more concerned about correctness than performance. Also, before doing further optimizations we should setup some benchmarks (see #98).
For the similarity metrics, caching the results for word pairs is a good idea, but at that level (processing corpora) it seems more like a part of some research project or application and less like a feature to be added to Wn. However, all the similarity metrics use hypernym path calculations, and currently each hop of such a path requires one or more hits to the database, and I've thought about ways to speed this up (#38, #110).
You might follow the issues linked above if you're interested in performance. Also, I'm happy to receive pull requests :)
Is there a systematic way to loop through all the SYNSETS i.e. synset iterator ?
Is it wn.synsets()
Any idea how can I speed up similarity calculations ?
I'm testing sentence comparisons. Just to give you an example comparing two words requires finding similarity of ~10-20 synsets, then if a sentence on avg has 10 words this means 100 word comparison per every two sentences ~1000 sims ... ~2s - 25sec ... then to compare ~1000++ sentences ... the numbers are enormous.. its should be ~1000++! but its not cause words repeat ..
I do caching of word2word sim, which helps, but any juice i can squeeze will be good
The text was updated successfully, but these errors were encountered: