-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Note added to annoytutorial.ipynb #1137
Conversation
The description seems incorrect; the parallelism is nothing to do with GIL or Python, it's on the level of BLAS. Also, typos (space after full stop, space before brackets, |
Okay. Another possible explanation is : If numpy on your machine is using one of the BLAS libraries like ATLAS or LAPACK, it ll run on multiple cores if the machine has multicore support. And clearly gensim's most_similar method is using numpy's dot operation. Does this description sound right? I ll make changes accordingly. Also will correct the typos. |
That's correct. Please change the PR |
Hi, unfortunately using Gensim doesn't guarantee multiple cores. Will I be possible to make it clear? |
Should I just remove the initial note written in bold? |
docs/notebooks/annoytutorial.ipynb
Outdated
@@ -179,7 +179,7 @@ | |||
"\n", | |||
">**Note**: Initialization time for the annoy indexer was not included in the times. The optimal knn algorithm for you to use will depend on how many queries you need to make and the size of the corpus. If you are making very few similarity queries, the time taken to initialize the annoy indexer will be longer than the time it would take the brute force method to retrieve results. If you are making many queries however, the time it takes to initialize the annoy indexer will be made up for by the incredibly fast retrieval times for queries once the indexer has been initialized\n", | |||
"\n", | |||
">**Note** : **If you are using gensim, it'll run on multiple cores**. Gensim's 'most_similar' method is using numpy operations in the form of dot product whereas Annoy's method isnt. If 'numpy' on your machine is using one of the BLAS libraries like ATLAS or LAPACK, it'll run on multiple cores(only if your machine has multicore support ). " | |||
">**Note** : Gensim's 'most_similar' method is using numpy operations in the form of dot product whereas Annoy's method isnt. If 'numpy' on your machine is using one of the BLAS libraries like ATLAS or LAPACK, it'll run on multiple cores(only if your machine has multicore support ). " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isnt
=> isn't
LAPACK is not BLAS.
cores(only
=> cores (only
support ).
=> support).
@tmylk , did you review before merging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that there is a comma missing before "or LAPACK", CC @greninja
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What comma? LAPACK is not a BLAS library, neither software uses LAPACK.
Maybe you meant OpenBlas?
Note explaining why gensim's 'most_similar' method uses multicore whereas annoy's 'most_similar' runs on a single core.