confusion about `lm_head`'s size? #354

tnq177 · 2022-10-05T00:39:28Z

In [58] xlmr = torch.hub.load('pytorch/fairseq', 'xlmr.large')
In [59]: xlmr.model.encoder.lm_head
Out[59]:
RobertaLMHead(
  (dense): Linear(in_features=1024, out_features=1024, bias=True)
  (layer_norm): FusedLayerNorm(torch.Size([1024]), eps=1e-05, elementwise_affine=True)
)
In [60]: xlmr.model.encoder.lm_head.weight.size()
Out[60]: torch.Size([250002, 1024])

In [61]: xlmr.model.encoder.lm_head.bias.size()
Out[61]: torch.Size([250002])

If I understand correctly, the lm_head is simply the word embedding in tied embedding case. What I don't understand is why it shows a dense layer of size [1024, 1024] but upon inspecting the weight and bias, it shows [250002, 1024]? I would assume [250002, 1024] is the correct one.

The text was updated successfully, but these errors were encountered:

tnq177 · 2022-10-05T01:10:24Z

oh i see the source code for lm_head, now worries.

tnq177 · 2022-10-05T01:23:06Z

Actually, I'm still confused. In the read of this repo, it instructs to Train your own XLM model with MLM or MLM+TLM using the train.py. Following train.py code, seems like it's using the Transformer implementation in this repo. However, in

XLM/xlm/model/transformer.py

Line 239 in cd281d3

class TransformerModel(nn.Module):

, I can only see the last layer pred_layer using only the word embedding, no embed_dim->embed_dim dense layer as in https://github.com/facebookresearch/fairseq/blob/main/fairseq/models/roberta/model.py#L475. Which one is correct please?

tnq177 closed this as completed Oct 5, 2022

tnq177 reopened this Oct 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

confusion about `lm_head`'s size? #354

confusion about `lm_head`'s size? #354

tnq177 commented Oct 5, 2022

tnq177 commented Oct 5, 2022

tnq177 commented Oct 5, 2022

confusion about lm_head's size? #354

confusion about lm_head's size? #354

Comments

tnq177 commented Oct 5, 2022

tnq177 commented Oct 5, 2022

tnq177 commented Oct 5, 2022

confusion about `lm_head`'s size? #354

confusion about `lm_head`'s size? #354