Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relying on numpy.random makes results hard to reproduce #113

Closed
larsmans opened this issue Feb 25, 2013 · 7 comments
Closed

Relying on numpy.random makes results hard to reproduce #113

larsmans opened this issue Feb 25, 2013 · 7 comments
Labels
difficulty easy Easy issue: required small fix

Comments

@larsmans
Copy link
Contributor

... and it also confuses users of LDA.

In scikit-learn, we've solved this by requiring a random_state argument to anything that wants random numbers; this may either be an np.random.RandomState object, or the seed for one. You might want to borrow this habit.

@adammenges
Copy link

👍

@piskvorky
Copy link
Owner

Here's a link to a pull request that did the same for GloVe; looks simple enough:
maciejkula/glove-python#32

@cvint13
Copy link

cvint13 commented Sep 8, 2017

When I run the model I still get different results every time despite using a fixed seed:

lda = LdaMulticore(corpus, num_topics=100, workers=4,id2word=dictionary,
                   random_state=np.random.RandomState(seed=10101010),
                  alpha=np.array([.0075] * 100))

@gojomo
Copy link
Collaborator

gojomo commented Sep 8, 2017

@cvint13 Anything multithreaded will also be subject to scheduling jitter from the OS – operations won't happen in the same order, and thus results will vary. For full reproduceability, you'd have to move to a (much-slower) single-threaded calculation (workers=1).

@cvint13
Copy link

cvint13 commented Sep 8, 2017

Ahhh ok, that sucks.

@piskvorky
Copy link
Owner

piskvorky commented Sep 9, 2017

For completeness -- LdaMulticore uses processes, not threads.

@menshikh-iv is it expected that @cvint13 would get different results despite a fixed seed? Are the individual jobs processed in arbitrary (non-deterministic) order by multiple workers?

@menshikh-iv
Copy link
Contributor

@piskvorky yes (for multicore implementation only).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty easy Easy issue: required small fix
Projects
None yet
Development

No branches or pull requests

7 participants