-
Notifications
You must be signed in to change notification settings - Fork 26.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't load models with a gamma or beta parameter #29554
Comments
Yes that's correct, it's a bug I pointed out in my video series on contributing to Transformers. This is due to these lines: transformers/src/transformers/modeling_utils.py Lines 579 to 582 in 0290ec1
I assume they are there for backwards compatibility reasons. If we would know which models require this exception, we could fix this. |
I assumed the same, but it's a pretty annoying bug to have to find on your own. Would it be worth adding a warning to the init method of the It's further complicated by the fact that |
Hi @malik-ali, thanks for raising this issue! Indeed, this isn't a desired behaviour.
I think this would be very hard to do. There are many saved checkpoints both on and off the hub, as well as all sorts of custom models which might rely on this behaviour.
Yes, I think a warning for a few cycle releases is the best way to go. I would put this in the It won't be possible to tell if the parameter is from an "old" state or a new model, but we can warn that the renaming is happening, that the behaviour will be removed in a future release and they should update the weights in their state dict to use "weight" or "bias" to be loaded properly. @malik-ali Would you like to open a PR to add this? This way you get the github contribution for your suggested solution |
@amyeroberts I'd be happy to! Just one question: if we add this to the I ask because I ran into this issue after training a model for several days and later loading it. It would have been nice to see the warning before doing all the training, so that I could rename the parameters on the spot. Do you think a warning like that would be feasible? (My fix was to manually rename the keys of the saved state_dict and then rename the parameters in my model) |
Good point! In this case, we'll need to add a warning in two places to make sure we catch both new model creations and old state dicts being loaded in. |
+1 Find this problem today... |
@amyeroberts I might not have a chance to push a fix for this for at least a few weeks so please feel free to make any changes as you (or anyone) wishes! |
@malik-ali OK - thanks for letting us know. I've added a 'Good difficult Issue' to flag for anyone in the community that might want to tackle this in the meantime |
It seems that you cannot create parameters with the string
gamma
orbeta
in any modules you write if you intend to save/load them with the transformers library. There is a small function called_fix_keys
implemented in the model loading (link). It renames all instances ofbeta
orgamma
in any substring of the sate_dict keys to bebias
andweight
. This means if your modules actually have a parameter with these names, they won't be loaded when using a pretrained model.As far as I can tell, it's completely undocumented that people shouldn't create any parameters with the string
gamma
orbeta
in them.Here is a minimal reproducible example:
When you run this code, you get the following error:
The text was updated successfully, but these errors were encountered: