Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change indices argument during training? #152

Closed
rraju1 opened this issue Feb 14, 2022 · 5 comments
Closed

change indices argument during training? #152

rraju1 opened this issue Feb 14, 2022 · 5 comments

Comments

@rraju1
Copy link

rraju1 commented Feb 14, 2022

Hi,

Thanks for the amazing work. My question is kind of basic. I want to know if I can change the indices the loader accesses during training. I'll try to explain through the use of an example. Suppose my dataset is [1..100] and I want to train on the set [0..49] on one epoch and on the set [50..100] for the subsequent epoch and I want to alternate between these two sets. In Pytorch, I can achieve this by changing the set of indices to be sampled with the SubsetRandomSampler class. Can I do something similar with ffcv or do I have to recompile my dataloader every epoch?

@GuillaumeLeclerc
Copy link
Collaborator

@rraju1 there is a parameter indices in the constructor of the Loader. It's not technically part of the API but you could just update loader.indices = SET_1 or SET_2 before you start each epoch. You shouldn't incur any performance penalty.

Hope it helps!

(feel free to reopen if it doesn't work for you)

@rraju1
Copy link
Author

rraju1 commented Feb 16, 2022

@GuillaumeLeclerc I tried your suggestion but it didn't update number of batches the network was processing (going from full set to a subset). But when I changed train_loader.indices = SET1 and train_loader.traversal_order.indices = SET1 together, it seems to work (the number of batches in an epoch change). Thanks!

@GuillaumeLeclerc
Copy link
Collaborator

Happy that it worked for you. Enjoy FFCV!

@hlzhang109
Copy link

I found it didn't work when I change the indices of a dataloader using
mask = np.arange(1000) dataloader.indices = mask
It still gives the original dataloader with 50,000 training data points in make_dataloaders()

def make_dataloaders(train_dataset=None, val_dataset=None, batch_size=None, num_workers=None):

@lucasresck
Copy link

when I changed train_loader.indices = SET1 and train_loader.traversal_order.indices = SET1 together, it seems to work (the number of batches in an epoch change)

Thank you, @rraju1! It seems to work here too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants