Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uniformly use skip for both (map-style) Dataset and IterableDataset #521

Merged
merged 2 commits into from
Aug 16, 2024

Conversation

tianyu-l
Copy link
Contributor

@tianyu-l tianyu-l commented Aug 15, 2024

Stack from ghstack (oldest at bottom):

The support of skip on "an IterableDataset obtained from split_dataset_by_node" has landed in huggingface/datasets#6965 and released in https://github.com/huggingface/datasets/releases/tag/2.21.0

For previous discussions see
https://discuss.huggingface.co/t/skip-not-implemented-for-iterabledataset-after-split-dataset-by-node/91450

I manually did a unit-test on the c4 dataset (as an IterableDataset) by replacing https://github.com/pytorch/torchtitan/blob/main/test/datasets/test_checkpoint.py#L14

tianyu-l added a commit that referenced this pull request Aug 15, 2024
ghstack-source-id: d82c233f21dddc74794d9c492a781dffc52eb5de
Pull Request resolved: #521
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 15, 2024
tianyu-l added a commit that referenced this pull request Aug 15, 2024
ghstack-source-id: c8f611742ffbb4859988b97e706b9e0d1b4ad6f1
Pull Request resolved: #521
Copy link
Contributor

@gokulavasan gokulavasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@tianyu-l tianyu-l merged commit ab895dc into gh/tianyu-l/20/base Aug 16, 2024
6 checks passed
tianyu-l added a commit that referenced this pull request Aug 16, 2024
ghstack-source-id: c8f611742ffbb4859988b97e706b9e0d1b4ad6f1
Pull Request resolved: #521
@tianyu-l tianyu-l deleted the gh/tianyu-l/20/head branch August 16, 2024 19:55
tianyu-l added a commit that referenced this pull request Aug 16, 2024
ghstack-source-id: c8f611742ffbb4859988b97e706b9e0d1b4ad6f1
Pull Request resolved: #521
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants