-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A few questions about the training pipeline #5
Comments
Thanks for your kind words!
This and the loss defined on line 508 make up
That's right, and the reason for this is that we force the generation of the Hope that makes sense! |
Thanks for your quick response. Reopening this issue for another query regarding the pipeline (didn't want to unnecessarily create new issue). If I am not wrong, this line makes the entire OPT embedding layer trainable. It is also evident from the First few lines from param_count.txt : Module | Trainable | Shape | Param Count || model.logit_scale | True | () | 1 | |
You're right, they become trainable, which is why we zero out the gradients of the non-[IMG] embeddings here: Lines 578 to 587 in 53fdcf2
This is not super ideal, but I think it is overall cleaner than concatenating a trainable embedding matrix with a frozen one. |
Thanks again. Unfortunately, I missed this section of the script. Is it also correct to say that for the lp loss, you are considering the loss for generating each token of the input text (caption) i.e. the negative log likelihood of generating token st conditioned on s1,...,st-1 where t={1,...,T}? |
Yes that's right! |
In that case, I believe equation 2 is slightly misleading, as the summation goes over |
You're absolutely right, thanks for pointing this out! We'll fix this in the paper soon. The correct scheme should be that the loss is only considered for the first |
As I was trying to train my own model and run inference using the saved checkpoint, I noticed a few things, please verify (might be helpful for other users).
img_token_embeddings = state_dict['model.input_embeddings.weight'].cpu().detach()[-model_kwargs['num_tokens']:, :] |
Thanks for sharing this!
You're right, and I also realized that I hadn't uploaded the script used to prune the checkpoints (keeping just the trained weights, and discarding the pretrained model weights). I just did that here: https://github.com/kohjingyu/gill/blob/main/scripts/prune_model_ckpt.py I think this is essentially the same as the changes you probably made locally, though I haven't tested this script in a while. |
Thanks for sharing the script! Just noticed a few things -
|
Thanks for the notes! Sorry about this, it's what happens when you don't test before you upload...
That's right.
|
Another small update. I could not find |
Ah, looks like it should be |
I have another question. As mentioned above, the lp loss includes the negative log likelihood (NLL) of generating each token of the input text (caption). Did you find this helpful for the overall model performance? I am asking this because from the name and purpose of this loss, I would assume that it was intended to only consider the NLL of generating |
I have not run this particular ablation, sorry. I would guess that it does not have a significant effect on performance on the tasks we evaluated on. |
Hi,
I read your paper and it was a great work! Thanks for sharing your codebase with the community. As I was going through your codes, I came across a few places, where I would greatly appreciate your explanations/suggestions. Here are my questions -
[IMG0]
have been set to -100 to be ignored from calculating loss. Is my understanding correct? If it is, how are we learning embeddings for other[IMG{r}]
tokens (r={2,3,...,8}
)?The text was updated successfully, but these errors were encountered: