-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the comment of Translation Matrix #1594
Conversation
gensim/models/translation_matrix.py
Outdated
|
||
Args: | ||
`word_pair` (list): a list pair of words | ||
`word_pairs` (list): a list pair of words | ||
`source_space` (Space object): source language space |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
train method use only word_pairs
, what is source/target space here?
gensim/models/translation_matrix.py
Outdated
self.source_space = Space.build(self.source_lang_vec, set(self.source_word)) | ||
self.target_space = Space.build(self.target_lang_vec, set(self.target_word)) | ||
self.source_word, self.target_word = zip(*word_pairs) | ||
if self.translation_matrix is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if I called train twice, in the second time I don't fit model.
Please remove this if
self.source_word_vec_file = datapath("EN.1-10.cbow1_wind5_hs0_neg10_size300_smpl1e-05.txt") | ||
self.target_word_vec_file = datapath("IT.1-10.cbow1_wind5_hs0_neg10_size300_smpl1e-05.txt") | ||
|
||
with utils.smart_open(self.train_file, "r") as f: | ||
self.word_pair = [tuple(utils.to_unicode(line).strip().split()) for line in f] | ||
self.word_pairs = [("one", "uno"), ("two", "due"), ("three", "tre"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use hanging indents
("grape", "acino"), ("banana", "banana"), ("mango", "mango") | ||
] | ||
|
||
self.test_word_pairs = [("ten", "dieci"), ("dog", "cane"), ("cat", "gatto")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove ("dog", "cane")
from self.word_pairs
gensim/models/translation_matrix.py
Outdated
@@ -91,7 +91,7 @@ def build(cls, lang_vec, lexicon=None): | |||
return Space(mat, words) | |||
|
|||
def normalize(self): | |||
""" normalized the word vector's matrix """ | |||
""" Normalized the word vector's matrix """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'Normalize…' (imperative rather than past-tense)
gensim/models/translation_matrix.py
Outdated
model = super(TranslationMatrix, cls).load(*args, **kwargs) | ||
return model | ||
|
||
def apply_transmat(self, words_space): | ||
""" | ||
mapping the source word vector to the target word vector using translation matrix | ||
Mapping the source word vector to the target word vector using translation matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'Map…' (imperative rather than '-ing' form)
The handling of Though I know I requested the Doc2Vec-related example, in its current form the motivations/benefits are muddled. Really it shouldn't require a separate helper class ( The word-translation example can presumably be evaluated based on real datasets in the original context that motivated the approach, while the doc-vec example will need more novel design/evaluation – so I'd recommend splitting them into separate notebooks. |
Thanks. You do remind me the imbalanced set problem in the example. And the code for training document vector are borrowed from the For the for If I didn't catch that |
I meant there are published papers about using word-vector transformations for language-translation - the original Google paper, the Dinu paper – so there are specific datasets & procedures to mimic – and similar results would indicate everything is working. The Doc2Vec use is novel so requires more experimentation/thought. |
The word pairs used in this experiment are extracted from the OPUS(http://opus.lingfil.uu.se/). The same as the Ninu's paper. I plot the vis to show the linear relationship between two language vector space and use the word translation to show this transformation works. More re-produced experiments from the Mikolov's and Ninu's paper would be fine to support this transformation. But I still can not find any experiment for language translation(Do it mean sentences translation ) from the two paper I mentioned here? Can you remind me If I miss something? I added "unstable/experimental" warning tag in notebook explicitly for I also found a paper (OFFLINE BILINGUAL WORD VECTORS,ORTHOGONALTRANSFORMATIONS AND THE INVERTED SOFTMAX) is related to this experiment. But I‘m just having a look and need dig deeper. |
Thank you @robotcator for fast fixes. |
Need to fix some typos/pep8 issues in a notebook, but I can't wait for more, it's release time. |
No description provided.