Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flag parameters for topic diff #1448

Merged
merged 13 commits into from
Aug 3, 2017

Conversation

parulsethi
Copy link
Contributor

@parulsethi parulsethi commented Jun 25, 2017

Add flag parameters to optionally calculate diff matrix, diff diagonal and annotations.

@menshikh-iv
Copy link
Contributor

The main idea of this PR - for diagonal case calculates only diagonal values (with/without annotation for diagonal), it's needed for free speed-up this function for concrete case (for example - visdom integration). Please change logic (first, check for diagonal case) and refactor code (remove duplication)

annotation = [[None] * t1_size for _ in range(t2_size)]
if diagonal:
t_size = min(t1_size, t2_size)
z = np.zeros(t_size)
Copy link
Contributor Author

@parulsethi parulsethi Jul 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the no. of topics is different between two models (diff will not be a square matrix), then what should be returned in the diagonal case? Either raise an error for num_topics to be same or just return the diff of identical topic no.s till the smaller value of num_topic of both model? (latter one implemented right now)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should raise error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced to raise the error instead

diff_terms[topic] = [pos_tokens, neg_tokens]

if normed:
if np.abs(np.max(z)) > 1e-8:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excess indent

# initialize z and annotation matrix
z = np.zeros((t1_size, t2_size))
if annotation:
diff_terms = np.zeros((t1_size, t2_size), dtype=list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This terms for diff and intersection (not only diff_terms), please rename this variable to old variant (annotation)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

annotation is already being used as boolean parameter to indicate if annotations should be calculated or not. So I've renamed it to annotation_terms.

for topic1 in range(t1_size):
for topic2 in range(t2_size):
z[topic] = distance_func(d1[topic1], d2[topic2])
if annotation:
pos_tokens = fst_topics[topic1] & snd_topics[topic2]
neg_tokens = fst_topics[topic1].symmetric_difference(snd_topics[topic2])

pos_tokens = sample(pos_tokens, min(len(pos_tokens), n_ann_terms))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I remember, you want to remove sample for another PR, you already do it in different PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it's in #1484

@menshikh-iv menshikh-iv merged commit 3cb8495 into piskvorky:develop Aug 3, 2017
@parulsethi parulsethi deleted the diff_flag_params branch August 3, 2017 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants