Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Vocab size not match model input size #333

Open
moment-of-peace opened this issue Apr 16, 2021 · 1 comment
Open

Vocab size not match model input size #333

moment-of-peace opened this issue Apr 16, 2021 · 1 comment

Comments

@moment-of-peace
Copy link

moment-of-peace commented Apr 16, 2021

Why the vocab and model checkpoint provided in "II. Cross-lingual language model pretraining (XLM)" of readme don' t match? For example, the size of vocab for "tokenize + lowercase + no accent + BPE" should be 95k (the embedding size of the model), but after downloading, the vocab file actually has more than 120k lines

@PootieT
Copy link

PootieT commented Oct 20, 2021

Similar issue here with XLM-R 100 language model vocab file, it should have 200K vocab when the downloaded file has 239776 vocab.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants