Vocab size not match model input size #333

moment-of-peace · 2021-04-16T03:42:24Z

Why the vocab and model checkpoint provided in "II. Cross-lingual language model pretraining (XLM)" of readme don' t match? For example, the size of vocab for "tokenize + lowercase + no accent + BPE" should be 95k (the embedding size of the model), but after downloading, the vocab file actually has more than 120k lines

PootieT · 2021-10-20T15:08:25Z

Similar issue here with XLM-R 100 language model vocab file, it should have 200K vocab when the downloaded file has 239776 vocab.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vocab size not match model input size #333

Vocab size not match model input size #333

moment-of-peace commented Apr 16, 2021 •

edited

Loading

PootieT commented Oct 20, 2021

Vocab size not match model input size #333

Vocab size not match model input size #333

Comments

moment-of-peace commented Apr 16, 2021 • edited Loading

PootieT commented Oct 20, 2021

moment-of-peace commented Apr 16, 2021 •

edited

Loading