You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Received the following traceback: python -W ignore -m multifit new multifit_paper_version replace_ --name my_lm - train_ --pretrain-dataset data/wiki/de-100 Setting LM weights seed seed to 0 Running tokenization: 'lm-notst' ... Wiki text was split to 1 articles Wiki text was split to 1 articles Wiki text was split to 1 articles Traceback (most recent call last): File "/home/ubuntu/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/ubuntu/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/ubuntu/multifit/multifit/__main__.py", line 16, in <module> fire.Fire(Experiment()) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/fire/core.py", line 138, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/fire/core.py", line 468, in _Fire target=component.__name__) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/home/ubuntu/multifit/multifit/training.py", line 587, in train_ self.pretrain_lm.train_(pretrain_dataset) File "/home/ubuntu/multifit/multifit/training.py", line 275, in train_ learn = self.get_learner(data_lm=dataset.load_lm_databunch(bs=self.bs, bptt=self.bptt, limit=self.limit)) File "/home/ubuntu/multifit/multifit/datasets/dataset.py", line 208, in load_lm_databunch limit=limit) File "/home/ubuntu/multifit/multifit/datasets/dataset.py", line 258, in load_n_cache_databunch databunch = self.databunch_from_df(bunch_class, train_df, valid_df, **args) File "/home/ubuntu/multifit/multifit/datasets/dataset.py", line 271, in databunch_from_df **args) File "/home/ubuntu/multifit/fastai_contrib/text_data.py", line 147, in make_data_bunch_from_df TextList.from_df(valid_df, path, cols=text_cols, processor=processor)) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/fastai/data_block.py", line 434, in __init__ if not self.train.ignore_empty and len(self.train.items) == 0: TypeError: len() of unsized object
From initial debugging, train.items is an ndarray with shape () . When I print it, it returns articles in German. I suppose this part suggests a problem Wiki text was split to 1 articles - I reckon the wiki text should be split in more than 1 article. So maybe something goes wrong in read_wiki_articles() in dataset.py... This is my educated guess, but I don't know where to go from here.
The text was updated successfully, but these errors were encountered:
My package versions differ slightly from those in requirements.txt, maybe sacremoses is related: fire 0.3.0 sacremoses 0.0.38 sentencepiece 0.1.85 fastai 1.0.47
What I did?
pretrain-lm
branch because it has clear instructions how to pretrain LM (Example how to pretrain lm + introduction of config_name #57).bash prepare_wiki.sh de
python -W ignore -m multifit new multifit_paper_version replace_ --name my_lm - train_ --pretrain-dataset data/wiki/de-100
python -W ignore -m multifit new multifit_paper_version replace_ --name my_lm - train_ --pretrain-dataset data/wiki/de-100
Setting LM weights seed seed to 0
Running tokenization: 'lm-notst' ...
Wiki text was split to 1 articles
Wiki text was split to 1 articles
Wiki text was split to 1 articles
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/ubuntu/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ubuntu/multifit/multifit/__main__.py", line 16, in <module>
fire.Fire(Experiment())
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/fire/core.py", line 138, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/fire/core.py", line 468, in _Fire
target=component.__name__)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/ubuntu/multifit/multifit/training.py", line 587, in train_
self.pretrain_lm.train_(pretrain_dataset)
File "/home/ubuntu/multifit/multifit/training.py", line 275, in train_
learn = self.get_learner(data_lm=dataset.load_lm_databunch(bs=self.bs, bptt=self.bptt, limit=self.limit))
File "/home/ubuntu/multifit/multifit/datasets/dataset.py", line 208, in load_lm_databunch
limit=limit)
File "/home/ubuntu/multifit/multifit/datasets/dataset.py", line 258, in load_n_cache_databunch
databunch = self.databunch_from_df(bunch_class, train_df, valid_df, **args)
File "/home/ubuntu/multifit/multifit/datasets/dataset.py", line 271, in databunch_from_df
**args)
File "/home/ubuntu/multifit/fastai_contrib/text_data.py", line 147, in make_data_bunch_from_df
TextList.from_df(valid_df, path, cols=text_cols, processor=processor))
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/fastai/data_block.py", line 434, in __init__
if not self.train.ignore_empty and len(self.train.items) == 0:
TypeError: len() of unsized object
From initial debugging,
train.items
is an ndarray with shape()
. When I print it, it returns articles in German. I suppose this part suggests a problemWiki text was split to 1 articles
- I reckon the wiki text should be split in more than 1 article. So maybe something goes wrong inread_wiki_articles()
indataset.py
... This is my educated guess, but I don't know where to go from here.The text was updated successfully, but these errors were encountered: