Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor dataloading #955

Merged
merged 4 commits into from
Feb 26, 2020
Merged

Conversation

ethanwharris
Copy link
Member

@ethanwharris ethanwharris commented Feb 26, 2020

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Fixes #953 Fixes #840 Fixes #698

  • Refactored calls to reset_train_dataloader and reset_val_dataloader to only happen when needed
  • Slightly changes the logic for disable validation to ignore the validation_dataloader length
  • Removed default unpacking of dataloader and adding of RandomSampler - see comment in refactor len(datasets) call. #953
  • Changed Exception order so that MisconfigurationException will be called and given when using IterableDataset
  • Removed dependency on Dataloader.dataset following Handle abstract loader that doesn't have a dataset member #840
  • num_training_batches =float('inf') is now the default when train dataloader doesn't have __len__ (in addition to when using an IterableDataset)
  • Added test for training using an iterable

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@Borda Borda added bug Something isn't working feature Is an improvement or enhancement labels Feb 26, 2020
@@ -271,7 +271,7 @@ def is_function_implemented(self, m):
pass

@abstractmethod
def is_iterable_dataloader(self, dataloader):
def is_infinite_dataloader(self, dataloader):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be considered infinite? Normally, IterableDatasets are finite, we just don't know how long they are the first epoch. Another way of doing this would be to set it to infinite (or -1, or whatever placeholder value works best) and keep counting how many steps we did the first epoch. Once we start the second epoch, we can print a bar and all since we know the length will not change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thinking about it that should totally just be has_len or similar. No problem with the idea of keeping a record of the number of steps - although I would probably opt for that to be done in a seperate PR (lile when we add IterableDataset support for val and test) - but I'm happy to add that now if desired

@williamFalcon
Copy link
Contributor

@ethanwharris nice job.

I'll merge this, then fix the TPU stuff then you can add what @Darktex suggested!
Running GPU tests now

@williamFalcon williamFalcon merged commit b2e9607 into Lightning-AI:master Feb 26, 2020
@Borda Borda added this to the 0.6.1 milestone Feb 27, 2020
tullie pushed a commit to tullie/pytorch-lightning that referenced this pull request Apr 3, 2020
* Refactor dataloading

* Refactor dataloading

* Refactor dataloading

* Add shuffle to test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feature Is an improvement or enhancement
Projects
None yet
4 participants