New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

How to train XNMT with datasets that do not fit in RAM? #585

Open

rrmariani opened this issue Jan 7, 2020 · 1 comment

rrmariani commented Jan 7, 2020 •

edited

Loading

I have source and target files that are 15GB each and the system crashes after allocating all the 32GB RAM I have and the 2GB of swap disk.

I am now trying with 15 pairs of 1GB files...

Is there a way to tell XNMT how to train on very large files?

Author

rrmariani commented Jan 21, 2020

I found an answer to my problem in the doc...

"sample_train_sents – If given, load a random subset of training sentences before each epoch. Useful when training data does not fit in memory."

I guess we need to read a large corpus by consecutive chunks as well, to make sure we cover the entire data set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment