Cannot run examples / pytest tests: #54

repodiac · 2019-11-05T10:45:31Z

I cannot make MultiFiT to work in my environment :-(

What I did was...

I checked out the repo and ran any "prepare..." script available.
I had to "pip install" the modules "fire" and "sacremoses" since they neither were available via the code nor the "fastai" package (I installed the most recent version 1.0.59)
I started pytest . or training according to the example python -m ulmfit lm --dataset-path data/wiki/${LANG}-100 --tokenizer='f' --nl 3 --name 'orig' --max-vocab 60000 \ --lang ${LANG} --qrnn=False - train 10 --bs=50 --drop_mult=0 --label-smoothing-eps=0.0

RESULT: I always get an UnicodeDecodeError

e.g. with the training command:

Max vocab: 60000
Cache dir: data/wiki/en-100/models/f60k
Model dir: data/wiki/en-100/models/f60k/lstm_orig.m
Wiki text was split to 28476 articles
Wiki text was split to 60 articles
Running tokenization lm...
Traceback (most recent call last): File "/home/user/miniconda/envs/py36/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/user/miniconda/envs/py36/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/app/work/ulmfit/__main__.py", line 188, in <module> fire.Fire(ULMFiT()) File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fire/core.py", line 138, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fire/core.py", line 471, in _Fire target=component.__name__) File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fire/core.py", line 675, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/app/work/ulmfit/pretrain_lm.py", line 164, in train_lm data_lm = self.load_wiki_data(bs=bs) if data_lm is None else data_lm File "/app/work/ulmfit/pretrain_lm.py", line 246, in load_wiki_data **args) File "/app/work/ulmfit/pretrain_lm.py", line 254, in lm_databunch return self.databunch(name, bunch_class=TextLMDataBunch, *args, **kwargs) File "/app/work/ulmfit/pretrain_lm.py", line 279, in databunch **args) File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastai/text/data.py", line 202, in from_df if cls==TextLMDataBunch: src = src.label_for_lm() File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastai/data_block.py", line 480, in _inner self.process() File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastai/data_block.py", line 534, in process for ds,n in zip(self.lists, ['train','valid','test']): ds.process(xp, yp, name=n) File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastai/data_block.py", line 714, in process self.x.process(xp) File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastai/data_block.py", line 84, in process for p in self.processor: p.process(self) File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastai/text/data.py", line 296, in process for i in progress_bar(range(0,len(ds),self.chunksize), leave=False): File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastprogress/fastprogress.py", line 75, in __iter__ if self.auto_update: self.update(i+1) File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastprogress/fastprogress.py", line 92, in update self.update_bar(val) File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastprogress/fastprogress.py", line 104, in update_bar else: self.on_update(val, f'{100 * val/self.total:.2f}% [{val}/{self.total} {elapsed_t}<{remaining_t}{end}]')
File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastprogress/fastprogress.py", line 274, in on_update
if printing(): WRITER_FN(to_write, end = '\r')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-35: ordinal not in range(128)

or with the tests (any!):

self = <encodings.ascii.IncrementalDecoder object at 0x7fdcdf958e10>
input = b' \n = Valkyria Chronicles III = \n \n Senj\xc5\x8d no Valkyria 3 : <unk> Chronicles ( Japanese : \xe6\x88\xa6\xe5\xa...n force invading the Empire just following the two nations \' cease @-@ fire would certainly wreck their newfound peac'
final = False

def decode(self, input, final=False):
> return codecs.ascii_decode(input, self.errors)[0]
E UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 39: ordinal not in range(128)

/home/user/miniconda/envs/py36/lib/python3.6/encodings/ascii.py:26: UnicodeDecodeError

Does anyone have a clue here? Thanks a lot in advance!

The text was updated successfully, but these errors were encountered:

PiotrCzapla · 2019-11-11T10:07:37Z

Hi @repodiac it seems you are using the older scripts . python -m ulmfit lm doesn't look like the new framework. You might want to try either run the current framework and see if the issue is solved there.
I think I've seen " 'ascii' codec can't encode characters" before, when loading tokenized datasets. The solution was to remove the cache files made by older fastai and recreate them. The new framework does this automatically.

Let me know if that helped.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot run examples / pytest tests: #54

Cannot run examples / pytest tests: #54

repodiac commented Nov 5, 2019

PiotrCzapla commented Nov 11, 2019

Cannot run examples / pytest tests: #54

Cannot run examples / pytest tests: #54

Comments

repodiac commented Nov 5, 2019

PiotrCzapla commented Nov 11, 2019