You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While running cls_dataset.load_clas_databunch(bs=exp.finetune_lm.bs).show_batch()
I'm getting this output
Running tokenization: 'lm-notst' ...
Validation set not found using 10% of trn
Data lm-notst, trn: 26925, val: 2991
Size of vocabulary: 15000
First 20 words in vocab: ['xxunk', 'xxpad', 'xxbos', 'xxfld', 'xxmaj', 'xxup', 'xxrep', 'xxwrep', '', '▁', '▁,', '▁.', '▁в', 'а', 'и', 'е', '▁и', 'й', '▁на', 'х']
Running tokenization: 'cls' ...
Data cls, trn: 26925, val: 2991
Running tokenization: 'tst' ...
/home/explorer/miniconda3/envs/fast/lib/python3.6/site-packages/fastai/data_block.py:537: UserWarning: You are labelling your items with CategoryList.
Your valid set contained the following unknown labels, the corresponding items have been discarded.
201, 119, 192, 162, 168...
if getattr(ds, 'warn', False): warn(ds.warn)
Data tst, trn: 2991, val: 7448
I assume this to be a problem with misrepresentation of labels in a validation set that was inferred automatically. Is there a way to explicitly pass a validation set?
The text was updated successfully, but these errors were encountered:
I'm training a language model similar to what has been shown here https://github.com/n-waves/multifit/blob/master/notebooks/CLS-JA.ipynb
While running
cls_dataset.load_clas_databunch(bs=exp.finetune_lm.bs).show_batch()
I'm getting this output
I assume this to be a problem with misrepresentation of labels in a validation set that was inferred automatically. Is there a way to explicitly pass a validation set?
The text was updated successfully, but these errors were encountered: