You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you very much for making your work available as a Julia package!
It helps me a lot to learn about Transformers (a topic which I don't know much...)
I was comparing:
It seems to me that we have cos(pos) (for i = 1) in embedding but we do not have its pair sin(pos) as in index i is 1-based. If we would shift i by 1 then we would have all sin/cos pairs.
Of course, this is a very new topic too me and I might be completely wrong here (or it simply does not matter...).
I can make a PR this is indeed an issue.
The text was updated successfully, but these errors were encountered:
Generally it doesn't matter that much. That function is just to create periodic function as the basis. So as long as those bases have similar attributes, the training behavior would be similar.
OTOH, it is an issue if we want the implementation to be exactly the same as the original paper. The index shift means we are actually using a complete different set of bases for the position encoding. We should change it so that the value could be the same. A PR is appreciated :)
Alexander-Barth
added a commit
to Alexander-Barth/Transformers.jl
that referenced
this issue
Jun 3, 2022
Thank you very much for making your work available as a Julia package!
It helps me a lot to learn about Transformers (a topic which I don't know much...)
I was comparing:
Transformers.jl/src/basic/embeds/position_embed.jl
Lines 19 to 25 in 7a7a25a
to
https://machinelearningmastery.com/a-gentle-introduction-to-positional-encoding-in-transformer-models-part-1/
It seems to me that we have
cos(pos)
(fori = 1
) inembedding
but we do not have its pairsin(pos)
as in indexi
is 1-based. If we would shifti
by 1 then we would have all sin/cos pairs.Of course, this is a very new topic too me and I might be completely wrong here (or it simply does not matter...).
I can make a PR this is indeed an issue.
The text was updated successfully, but these errors were encountered: