Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble changing indices during training #208

Closed
rraju1 opened this issue Mar 31, 2022 · 2 comments
Closed

Trouble changing indices during training #208

rraju1 opened this issue Mar 31, 2022 · 2 comments

Comments

@rraju1
Copy link

rraju1 commented Mar 31, 2022

Hi, thanks for the amazing package. I am trying to use ffcv for one of my tasks where we train on different subsets of data at different points in training. Sometimes it involves taking random subsets from the dataset. Unfortunately, I get this ValueError: empty range for randrange() after sampling. Based on the trace, it looks the error happens in quasi_random.py but I'm not sure where this call for the randrange function is executed. I have a MWE showing the error.

Trace:

Traceback (most recent call last):
  File "/research/rraju2/ffcv-imagenet/mwe.py", line 60, in <module>
    for ix, batch in enumerate(loader):
  File "/home/rraju2/anaconda3/envs/ffcv/lib/python3.9/site-packages/ffcv/loader/loader.py", line 206, in __iter__
    order = self.next_traversal_order()
  File "/home/rraju2/anaconda3/envs/ffcv/lib/python3.9/site-packages/ffcv/loader/loader.py", line 202, in next_traversal_order
    return self.traversal_order.sample_order(self.next_epoch)
  File "/home/rraju2/anaconda3/envs/ffcv/lib/python3.9/site-packages/ffcv/traversal_order/quasi_random.py", line 81, in sample_order
    generate_order_inner(seed, self.page_to_samples_array,
ValueError: empty range for randrange()

MWE:

import numpy as np
from ffcv.fields import NDArrayField, FloatField
from ffcv.writer import DatasetWriter

from ffcv.loader import Loader, OrderOption
from ffcv.fields.decoders import NDArrayDecoder, FloatDecoder
from ffcv.transforms import ToTensor

from ffcv.traversal_order import Sequential, QuasiRandom


class LinearRegressionDataset:
    def __init__(self, N):
        self.X = np.arange(N)
        self.Y = np.arange(N)

    def __getitem__(self, idx):
        return (self.X[idx].astype('float32'), self.Y[idx])

    def __len__(self):
        return len(self.X)

dataset = LinearRegressionDataset(100)

writer = DatasetWriter("./example.beton", {
    'first': NDArrayField(shape=(1,), dtype=np.dtype('float32')),
    'second': FloatField(),
} ,num_workers=4)
writer.from_indexed_dataset(dataset)

ORDERING = OrderOption.QUASI_RANDOM

loader = Loader('./example.beton',
                batch_size=1,
                num_workers=4,
                order=ORDERING,
                indices=np.arange(3),
                pipelines={
                    'first': [NDArrayDecoder(), ToTensor()],
                    'second': [FloatDecoder(), ToTensor()]
                })

# prints permutations of [1, 2, 3]
for ix, batch in enumerate(loader):
    print(batch[0])
print(f'change indices set1')
set1 = np.arange(10)[7:]
loader.indices = set1
loader.traversal_order = QuasiRandom(loader)
# prints permutations of [7, 8, 9] - always works
for ix, batch in enumerate(loader):
    print(batch[0])
print(f'change indices set2')
set2 = np.random.choice(np.arange(100)[11:50], 15) # fails
# set2 = np.sort(set2) # fails
# set2 = set2 = np.arange(100)[10:26] # works
loader.indices = set2
loader.traversal_order = QuasiRandom(loader)
for ix, batch in enumerate(loader):
    print(batch[0])
@rraju1
Copy link
Author

rraju1 commented Apr 19, 2022

I found out my error. What I was doing was including adding an indices array where you can have the same index the loader draws samples from (sampling with replacement). The loader expects unique indices to sample from or else it throws an error so when I set the replace option to False in np.random.choice, my code works.

@rraju1 rraju1 closed this as completed Apr 19, 2022
@lucasresck
Copy link

Hi, @rraju1! Why do you set loader.traversal_order = QuasiRandom(loader) instead of just setting loader.traversal_order.indices = set2, as you said in #152 (comment)? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants