-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data] Add dataset.random_sample() API #24449
Comments
I'd like to give this a shot! I'm having a bit of trouble setting up my development environment on macOS though. Should I just go ahead and switch to an ubuntu machine? EDIT: Turns out the wheel link given in the documentation is not universal. Anyway, currently have it working on my other computer. |
Would adding in a feature to sample a fraction (as referenced in the forum) be useful too? |
@xiurobert thanks for picking it up. Out of curiosity, how did you find this issue? |
I was randomly browsing GitHub out of boredom and this repo was recommended to me. |
Now that the random_sample API is added, can this issue be closed? |
Description
Per https://discuss.ray.io/t/how-do-i-sample-from-a-ray-datasets/5308, we should add a
random_sample(N)
API that returns N records from a Dataset. This can be implemented via amap_batches()
followed by a take().cc @simon-mo @clarkzinzow
Use case
Random sample is useful for a variety of scenarios, including creating training batches, and downsampling the dataset for faster analysis / testing.
The text was updated successfully, but these errors were encountered: