-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed lda options #782
Conversation
@@ -15,14 +15,21 @@ | |||
|
|||
|
|||
from __future__ import with_statement | |||
import os, sys, logging, threading, time | |||
import argparse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is py2.7 only. @tmylk I don't think we can drop support for py2.6 yet... is this import safe?
If it's triggered only on importing lda_dispatcher.py
, it's probably fine... but we don't want py2.7+ imports in "core" gensim (at import gensim
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked, this triggered only on importing lda_dispatcher.py
or lda_worker.py
.
Backport for argparse in setup.py
for python < 2.7 (proof)
Awesome! This is a great update, and nicely done too. If you don't mind me asking, how do you use this distributed LDA @menshikh-iv? What is your usecase/goal? |
@piskvorky, I have two usecases:
I need to train LDA on large corpus of 'webpages content' and vectorize all webpages. Train process of LDA are very long. I could use several dedicated servers for training, but they not in local network, therefore I modified distributed LDA for my case. |
Thanks, interesting! Is this a personal project, academic research or a commercial project? (We keep a list of gensim adopters.) |
@piskvorky personal research for now |
@menshikh-iv Thanks for the PR! Could you add a short notebook-style tutorial for this feature and a note in the changelog? |
@tmylk, unfortunately notebook-style tutorial for this feature is useless, because in notebook I can't demonstrate this feature. Maybe I update this page in documentation with small examples (like this message) ? About changelog, I should add record to 0.3.12 in CHANGELOG.md ? And I shoud create new PR for this actions? |
Hi @menshikh-iv, the 0.3.12 is the right version to use. A new small PR would be good. Updating this page with instructions would be great: |
Documentation changed from |
Update distributed LDA support. Now we can run worker/dispatcher in different network segments (not reachable by network broadcast). Broadcast variant also saved.
If you want to use broadcast, reading tutorial https://radimrehurek.com/gensim/dist_lsi.html on official site.
If you want to use new feature, add some arguments when you run a code, for example
export PYRO_SERIALIZERS_ACCEPTED=pickle export PYRO_SERIALIZER=pickle
'python -m Pyro4.naming --host 0.0.0.0 --port <NS_PORT> -x
python -m gensim.models.lda_worker --host <NS_HOSTNAME> --port <NS_PORT> --no-broadcast -v
python -m gensim.models.lda_dispatcher --host <NS_HOSTNAME> --port <NS_PORT> --no-broadcast -v
lda = LdaModel(..., ns_conf={"host": NS_HOST, "port": NS_PORT, "broadcast": False})