Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expected Data Format #715

Open
aflah02 opened this issue Aug 27, 2024 · 1 comment
Open

Expected Data Format #715

aflah02 opened this issue Aug 27, 2024 · 1 comment
Labels
type/question An issue that's a question

Comments

@aflah02
Copy link

aflah02 commented Aug 27, 2024

❓ The question

I was looking at the config files and noticed that the config files sometimes point to .npy files for the dataset. Is there any script to generate the same from a set of text files or any other format.

@aflah02 aflah02 added the type/question An issue that's a question label Aug 27, 2024
@aman-17
Copy link
Member

aman-17 commented Oct 19, 2024

You can use Hugging Face to download the dataset directly or the Dolma toolkit. The Hugging Face repository provides easy access to the dataset, and the Dolma toolkit offers utilities to handle different data format. If you need further help, feel free to follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/question An issue that's a question
Projects
None yet
Development

No branches or pull requests

2 participants