You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
In [58] xlmr = torch.hub.load('pytorch/fairseq', 'xlmr.large')
In [59]: xlmr.model.encoder.lm_head
Out[59]:
RobertaLMHead(
(dense): Linear(in_features=1024, out_features=1024, bias=True)
(layer_norm): FusedLayerNorm(torch.Size([1024]), eps=1e-05, elementwise_affine=True)
)
In [60]: xlmr.model.encoder.lm_head.weight.size()
Out[60]: torch.Size([250002, 1024])
In [61]: xlmr.model.encoder.lm_head.bias.size()
Out[61]: torch.Size([250002])
If I understand correctly, the lm_head is simply the word embedding in tied embedding case. What I don't understand is why it shows a dense layer of size [1024, 1024] but upon inspecting the weight and bias, it shows [250002, 1024]? I would assume [250002, 1024] is the correct one.
The text was updated successfully, but these errors were encountered:
Actually, I'm still confused. In the read of this repo, it instructs to Train your own XLM model with MLM or MLM+TLM using the train.py. Following train.py code, seems like it's using the Transformer implementation in this repo. However, in
If I understand correctly, the
lm_head
is simply the word embedding in tied embedding case. What I don't understand is why it shows a dense layer of size[1024, 1024]
but upon inspecting the weight and bias, it shows[250002, 1024]
? I would assume[250002, 1024]
is the correct one.The text was updated successfully, but these errors were encountered: