-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Advanced Initialization Methods #66
base: master
Are you sure you want to change the base?
Advanced Initialization Methods #66
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think now that I merged the previous PR there is some conflict in train_model.py
, could you have a look at that?
@@ -8,9 +9,13 @@ class Seq2seq(BaseModel): | |||
and decoder. | |||
""" | |||
|
|||
def __init__(self, encoder, decoder, decode_function=F.log_softmax): | |||
def __init__(self, encoder, decoder, decode_function=F.log_softmax, | |||
uniform_init=0, glorot_init=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting all this in the init like this perhaps doesn't generalise very if you want to add also other kind of initialisations. Perhaps instead we could pass a function for initialisation? Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see what you mean, so we would move this initialization function outside of the model class and maybe into a util? And one would have the option of passing an initialization function to any class and have the weights of that class initialized accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think about it that way initially because I was following how openNMT has it but I think that makes more sense actually. So ignore this PR for now haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks!
p.data.uniform_(-uniform_init, uniform_init) | ||
|
||
# xavier/glorot initialization if glorot_init | ||
if glorot_init: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know much about initialisations, so mostly a question: does the glorot initialisation depend on the probability of the uniform? If not they are both specified does the glorot_init overwrite the uniform_init paramater? In the tests it seems that the are not mutually exclusive. Perhaps we could add some docstring explaining how they behave/interact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes if both are passed, then in this case Glorot overrides the uniform initialization for all parameters except biases. You usually tend to use either uniform or glorot/xavier at a time so I could add a docstring about this but it's not super likely that someone activates them both at the same time. Also code-wise this is exactly how openNMT does it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, makes sense. My confusion was mostly coming from my ignorance:).
Addresses Issue #54:
Adds Xavier/Glorot initialization and Uniform initialization to the
Seq2seq
class. By default those are turned off - but offers an interface to do so in machine.Uniform initialization is done by passing for instance
uniform\_init=0.1
, this initializes all parameters in the SeqtoSeq to be uniformly sampled from (-uniform_init, uniform_init). If a value less than or equal to 0 is passed to uniform_init then it is not used.The Xavier/Glorot initialization can can be turned on by passing
glorot\_init=True
.I've also added those initialization into the integration tests and the seq2seq class tests.