Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about [img] token and train data #44

Open
ALR-alr opened this issue Aug 25, 2024 · 0 comments
Open

about [img] token and train data #44

ALR-alr opened this issue Aug 25, 2024 · 0 comments

Comments

@ALR-alr
Copy link

ALR-alr commented Aug 25, 2024

I have some questions with the paper.
1、As mentioned in this issue:#5 (comment), it is said that "So the model will never produce [IMG2]...[IMG8] organically, but their representations are still helpful for feeding into the GILLMapper module for image generation." But if the model doesn't produce [IMG2]...[IMG8], how can we use the hidden states of these tokens to complete the image generation and retrieval tasks? If we use the representations from embedding matrixes, it means whatever image we input, the same feature we use to generate and retrieve?
2、Are the loss objects lc and lp trained in two stages? Because when training lp we need [IMG1]...[IMGr] as input, but when training lc, the input is interleaved image and text.
3、If we don't need interleaved image-text dataset, how does model know when to generate [IMG0] token?
4、How to force the [IMG2]...[IMG8] to be produced after [IMG0]?
Thanks for your attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant