Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRU implementation #25

Open
jayparks opened this issue Jun 26, 2016 · 6 comments
Open

GRU implementation #25

jayparks opened this issue Jun 26, 2016 · 6 comments

Comments

@jayparks
Copy link

jayparks commented Jun 26, 2016

Hi team. thanks for great work.

I'm currently trying to construct seq2seq model with Bahdanau style, featured by bidirectional encoder - decoder using GRU cell.

This repository, however, doesn't seem to have GRU implementation, which now i'm trying to add.
(If you already have, it would be a lot of help for time saving..!)

Is it just the models.lua/make_lstm function that i need to modify?

And one more question is about bidirectional encoder:

The description of bidirectional encoder options' saying: "hidden states of the corresponding forward/backward LSTMs are added to obtain the hidden representation for that time step."

does it mean it literally adds two hidden states, not the normal concatenation scheme?

and if so, would it be compatible to other codes if i simply change the hidden representation into the concatenation of two hidden states?

thanks for the reply in advance

@yoonkim
Copy link
Member

yoonkim commented Jun 26, 2016

For the GRU implementation, in addition to modifying make_lstm, it will require some tinkering in the training code, as GRU doesn't have a cell state in addition to the hidden state. (so rnn_state_enc is a table with 2*opt.num_layers with LSTM but it should just be opt.num_layers with GRUs). It would be great to have GRU--let us know if/when you've implemented it! Also, apparently GRU is a little harder to optimize with vanilla SGD so you might want to use adagrad/adam/adadelta (we provide adagrad as an option currently).

Yeah, we just simply add the hidden states, since doing the concatenation would require some fiddling with initializing the decoder with the last step of the encoder (since encoder would have twice the number of dimensions). I guess you could use do a linear layer to get back to the original dimensions, which would be slightly annoying.

Hope this helps!

@jayparks
Copy link
Author

Hi! Thanks to your help, i'm currently working on GRU implementation.
I'll let you know once it's done :)

@ylhsieh
Copy link

ylhsieh commented Aug 5, 2016

I also am very interested in the GRU implementation! BTW, could @yoonkim elaborate on the part where GRU is harder to optimize using SGD? I don't quite understand why. Or perhaps you can point me toward some references? Thanks.

@yoonkim
Copy link
Member

yoonkim commented Aug 5, 2016

i don't really understand why either, but empirically i've found this to be the case (vanilla SGD doesn't really work well with GRU).

@emjotde
Copy link

emjotde commented Nov 8, 2016

Hi,
has there been any progress on this? Would be nice to have in order to do proper comparisons between models and tool kits.

@bmilde
Copy link

bmilde commented Nov 30, 2016

+1, would be interested in this for comparison purposes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants