-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor dataloading #955
Refactor dataloading #955
Conversation
@@ -271,7 +271,7 @@ def is_function_implemented(self, m): | |||
pass | |||
|
|||
@abstractmethod | |||
def is_iterable_dataloader(self, dataloader): | |||
def is_infinite_dataloader(self, dataloader): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be considered infinite? Normally, IterableDatasets are finite, we just don't know how long they are the first epoch. Another way of doing this would be to set it to infinite (or -1, or whatever placeholder value works best) and keep counting how many steps we did the first epoch. Once we start the second epoch, we can print a bar and all since we know the length will not change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, thinking about it that should totally just be has_len
or similar. No problem with the idea of keeping a record of the number of steps - although I would probably opt for that to be done in a seperate PR (lile when we add IterableDataset support for val and test) - but I'm happy to add that now if desired
@ethanwharris nice job. I'll merge this, then fix the TPU stuff then you can add what @Darktex suggested! |
* Refactor dataloading * Refactor dataloading * Refactor dataloading * Add shuffle to test
Before submitting
What does this PR do?
Fixes #953 Fixes #840 Fixes #698
reset_train_dataloader
andreset_val_dataloader
to only happen when neededRandomSampler
- see comment in refactor len(datasets) call. #953Dataloader.dataset
following Handle abstract loader that doesn't have a dataset member #840num_training_batches =float('inf')
is now the default when train dataloader doesn't have__len__
(in addition to when using anIterableDataset
)PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃