Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix epsilon according to dtype in LdaModel #1770

Merged
merged 5 commits into from
Dec 7, 2017
Merged

Conversation

menshikh-iv
Copy link
Contributor

That's small fix for #1656, big thanks @rmalouf for catch this potential bug #1656 (comment)

What's done:

  • limit number of types (only numpy.float16, numpy.float32, numpy.float64)
  • add adoptive eps depends on dtype (instead of hardcoded 1e-100)

CC: @piskvorky

@@ -286,6 +286,9 @@ def __init__(self, corpus=None, num_topics=100, id2word=None,
>>> lda = LdaModel(corpus, num_topics=50, alpha='auto', eval_every=5) # train asymmetric alpha from data

"""
if dtype not in {np.float16, np.float32, np.float64}:
raise ValueError("Incorrect 'dtype', please choice one of numpy.float16, numpy.float32 or numpy.float64")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

choice => choose

@@ -498,7 +501,8 @@ def inference(self, chunk, collect_sstats=False):
# The optimal phi_{dwk} is proportional to expElogthetad_k * expElogbetad_w.
# phinorm is the normalizer.
# TODO treat zeros explicitly, instead of adding 1e-100?
phinorm = np.dot(expElogthetad, expElogbetad) + 1e-100
eps = 1e-100 if self.dtype == np.float64 else (1e-35 if self.dtype == np.float32 else 1e-5)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easier to read and maintain as a mapping?

dtype_to_eps = {
    np.float64: 1e-100,
    np.float32: 1e-35,
    np.float16: 1e-5,
}

And then also if dtype not in dtype_to_eps: ...

@@ -498,7 +501,13 @@ def inference(self, chunk, collect_sstats=False):
# The optimal phi_{dwk} is proportional to expElogthetad_k * expElogbetad_w.
# phinorm is the normalizer.
# TODO treat zeros explicitly, instead of adding 1e-100?
Copy link
Owner

@piskvorky piskvorky Dec 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should update comment too: 1e-100 => epsilon.

@@ -498,7 +501,13 @@ def inference(self, chunk, collect_sstats=False):
# The optimal phi_{dwk} is proportional to expElogthetad_k * expElogbetad_w.
# phinorm is the normalizer.
# TODO treat zeros explicitly, instead of adding 1e-100?
phinorm = np.dot(expElogthetad, expElogbetad) + 1e-100
dtype_to_eps = {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better defined at module level, so __init__ can make use of the allowed keys? Now the same information is in three places, not DRY.

@menshikh-iv menshikh-iv merged commit be4500e into develop Dec 7, 2017
@menshikh-iv menshikh-iv deleted the fix-lda-const branch December 7, 2017 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants