Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

confusion about lm_head's size? #354

Open
tnq177 opened this issue Oct 5, 2022 · 2 comments
Open

confusion about lm_head's size? #354

tnq177 opened this issue Oct 5, 2022 · 2 comments

Comments

@tnq177
Copy link

tnq177 commented Oct 5, 2022

In [58] xlmr = torch.hub.load('pytorch/fairseq', 'xlmr.large')
In [59]: xlmr.model.encoder.lm_head
Out[59]:
RobertaLMHead(
  (dense): Linear(in_features=1024, out_features=1024, bias=True)
  (layer_norm): FusedLayerNorm(torch.Size([1024]), eps=1e-05, elementwise_affine=True)
)
In [60]: xlmr.model.encoder.lm_head.weight.size()
Out[60]: torch.Size([250002, 1024])

In [61]: xlmr.model.encoder.lm_head.bias.size()
Out[61]: torch.Size([250002])

If I understand correctly, the lm_head is simply the word embedding in tied embedding case. What I don't understand is why it shows a dense layer of size [1024, 1024] but upon inspecting the weight and bias, it shows [250002, 1024]? I would assume [250002, 1024] is the correct one.

@tnq177
Copy link
Author

tnq177 commented Oct 5, 2022

oh i see the source code for lm_head, now worries.

@tnq177 tnq177 closed this as completed Oct 5, 2022
@tnq177 tnq177 reopened this Oct 5, 2022
@tnq177
Copy link
Author

tnq177 commented Oct 5, 2022

Actually, I'm still confused. In the read of this repo, it instructs to Train your own XLM model with MLM or MLM+TLM using the train.py. Following train.py code, seems like it's using the Transformer implementation in this repo. However, in

class TransformerModel(nn.Module):
, I can only see the last layer pred_layer using only the word embedding, no embed_dim->embed_dim dense layer as in https://github.com/facebookresearch/fairseq/blob/main/fairseq/models/roberta/model.py#L475. Which one is correct please?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant