-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add flag parameters for topic diff #1448
Conversation
The main idea of this PR - for diagonal case calculates only diagonal values (with/without annotation for diagonal), it's needed for free speed-up this function for concrete case (for example - visdom integration). Please change logic (first, check for diagonal case) and refactor code (remove duplication) |
gensim/models/ldamodel.py
Outdated
annotation = [[None] * t1_size for _ in range(t2_size)] | ||
if diagonal: | ||
t_size = min(t1_size, t2_size) | ||
z = np.zeros(t_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the no. of topics is different between two models (diff will not be a square matrix), then what should be returned in the diagonal
case? Either raise an error for num_topics to be same or just return the diff of identical topic no.s till the smaller value of num_topic of both model? (latter one implemented right now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should raise error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced to raise the error instead
gensim/models/ldamodel.py
Outdated
diff_terms[topic] = [pos_tokens, neg_tokens] | ||
|
||
if normed: | ||
if np.abs(np.max(z)) > 1e-8: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excess indent
gensim/models/ldamodel.py
Outdated
# initialize z and annotation matrix | ||
z = np.zeros((t1_size, t2_size)) | ||
if annotation: | ||
diff_terms = np.zeros((t1_size, t2_size), dtype=list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This terms for diff
and intersection
(not only diff_terms
), please rename this variable to old variant (annotation
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
annotation
is already being used as boolean parameter to indicate if annotations should be calculated or not. So I've renamed it to annotation_terms
.
for topic1 in range(t1_size): | ||
for topic2 in range(t2_size): | ||
z[topic] = distance_func(d1[topic1], d2[topic2]) | ||
if annotation: | ||
pos_tokens = fst_topics[topic1] & snd_topics[topic2] | ||
neg_tokens = fst_topics[topic1].symmetric_difference(snd_topics[topic2]) | ||
|
||
pos_tokens = sample(pos_tokens, min(len(pos_tokens), n_ann_terms)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I remember, you want to remove sample
for another PR, you already do it in different PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it's in #1484
Add flag parameters to optionally calculate diff matrix, diff diagonal and annotations.