Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrofit Update Function #12

Open
guenthermi opened this issue Jun 21, 2019 · 1 comment
Open

Retrofit Update Function #12

guenthermi opened this issue Jun 21, 2019 · 1 comment

Comments

@guenthermi
Copy link

Hello,
I wondered about the update function. In the code, every word vector is updated by the mean of the centroid of all neighbors and the initial vector, which accords to the update function proposed in the paper. However, I do not really understand how to get to this update function. In the paper, it is stated that the update function is deduced from the loss function by taking the partial derivative with respect to a vector x_i, set it to zero and transpose it to the vector x_i. The loss function is defined as follows:
sum[i..n](alpha * ||x_i - x_i'||^2 + sum[j:(i,j) in E](beta_ij||x_i - x_j||^2))

Here x_i' is the initialization of x_i. The graph is undirected and thus (i,j) in E implies (j,i) in E, which also means that every edge is considered two times in the formula (one time in the left part of the distance calculation and one time in the right part as a neighbor). Taking the derivative of this equation with respect to x_i, I came up with the following formula:
x_i = (alpha*x_i' + sum[j:(i,j) in E]((beta_ij + beta_ji)*x_j))/(alpha+sum[j:(i,j) in E](beta_ij+beta_ji))
which is different from
x_i = (alpha*x_i' + sum[j:(i,j) in E](beta_ij*x_j))/(alpha+sum[j:(i,j) in E](beta_ij))
I then tried out the calculation of the algorithm with both update functions on a small set of one-dimensional vectors and determined the loss after 100 iterations.
As a result, I got a lower loss for the new update function. Do I misunderstand something in the formulas?

@gauravkoradiya
Copy link

How could i find value of Beta if i am using pretrained word embeddings?
According to me u are using not pretrained embedding but handcrafted embeddings as describe in paper as so its very easy to have a value of beta as co-occurrence probability.
I think for pretrained value of similarity measure could be value of beta. does it right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants