You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have some questions with the paper.
1、As mentioned in this issue:#5 (comment), it is said that "So the model will never produce [IMG2]...[IMG8] organically, but their representations are still helpful for feeding into the GILLMapper module for image generation." But if the model doesn't produce [IMG2]...[IMG8], how can we use the hidden states of these tokens to complete the image generation and retrieval tasks? If we use the representations from embedding matrixes, it means whatever image we input, the same feature we use to generate and retrieve?
2、Are the loss objects lc and lp trained in two stages? Because when training lp we need [IMG1]...[IMGr] as input, but when training lc, the input is interleaved image and text.
3、If we don't need interleaved image-text dataset, how does model know when to generate [IMG0] token?
4、How to force the [IMG2]...[IMG8] to be produced after [IMG0]?
Thanks for your attention.
The text was updated successfully, but these errors were encountered:
I have some questions with the paper.
1、As mentioned in this issue:#5 (comment), it is said that "So the model will never produce [IMG2]...[IMG8] organically, but their representations are still helpful for feeding into the GILLMapper module for image generation." But if the model doesn't produce [IMG2]...[IMG8], how can we use the hidden states of these tokens to complete the image generation and retrieval tasks? If we use the representations from embedding matrixes, it means whatever image we input, the same feature we use to generate and retrieve?
2、Are the loss objects lc and lp trained in two stages? Because when training lp we need [IMG1]...[IMGr] as input, but when training lc, the input is interleaved image and text.
3、If we don't need interleaved image-text dataset, how does model know when to generate [IMG0] token?
4、How to force the [IMG2]...[IMG8] to be produced after [IMG0]?
Thanks for your attention.
The text was updated successfully, but these errors were encountered: