Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced Initialization Methods #66

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

gautierdag
Copy link
Member

Addresses Issue #54:
Adds Xavier/Glorot initialization and Uniform initialization to the Seq2seq class. By default those are turned off - but offers an interface to do so in machine.

Uniform initialization is done by passing for instance uniform\_init=0.1, this initializes all parameters in the SeqtoSeq to be uniformly sampled from (-uniform_init, uniform_init). If a value less than or equal to 0 is passed to uniform_init then it is not used.

The Xavier/Glorot initialization can can be turned on by passing glorot\_init=True.

I've also added those initialization into the integration tests and the seq2seq class tests.

Copy link
Member

@dieuwkehupkes dieuwkehupkes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think now that I merged the previous PR there is some conflict in train_model.py, could you have a look at that?

@@ -8,9 +9,13 @@ class Seq2seq(BaseModel):
and decoder.
"""

def __init__(self, encoder, decoder, decode_function=F.log_softmax):
def __init__(self, encoder, decoder, decode_function=F.log_softmax,
uniform_init=0, glorot_init=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting all this in the init like this perhaps doesn't generalise very if you want to add also other kind of initialisations. Perhaps instead we could pass a function for initialisation? Does that make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see what you mean, so we would move this initialization function outside of the model class and maybe into a util? And one would have the option of passing an initialization function to any class and have the weights of that class initialized accordingly?

Copy link
Member Author

@gautierdag gautierdag Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think about it that way initially because I was following how openNMT has it but I think that makes more sense actually. So ignore this PR for now haha

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks!

p.data.uniform_(-uniform_init, uniform_init)

# xavier/glorot initialization if glorot_init
if glorot_init:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about initialisations, so mostly a question: does the glorot initialisation depend on the probability of the uniform? If not they are both specified does the glorot_init overwrite the uniform_init paramater? In the tests it seems that the are not mutually exclusive. Perhaps we could add some docstring explaining how they behave/interact?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if both are passed, then in this case Glorot overrides the uniform initialization for all parameters except biases. You usually tend to use either uniform or glorot/xavier at a time so I could add a docstring about this but it's not super likely that someone activates them both at the same time. Also code-wise this is exactly how openNMT does it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, makes sense. My confusion was mostly coming from my ignorance:).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants