We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
您好,目前我正在用finetune_cosmopedia.sh进行继续预训练,用HuggingFaceTB上的数据集可以实现继续预训练,但是我目前想要使用自己的数据集,我的数据集格式是txt,我想知道有没有办法将我们自己的数据转变成可以用于继续预训练的方法,或者有没有类似的工具呢,谢谢。
The text was updated successfully, but these errors were encountered:
您可以参考huggingface dataset的官方文档读入txt文件:https://huggingface.co/docs/datasets/nlp_load
Sorry, something went wrong.
好的,我先试试看,感谢回复
No branches or pull requests
您好,目前我正在用finetune_cosmopedia.sh进行继续预训练,用HuggingFaceTB上的数据集可以实现继续预训练,但是我目前想要使用自己的数据集,我的数据集格式是txt,我想知道有没有办法将我们自己的数据转变成可以用于继续预训练的方法,或者有没有类似的工具呢,谢谢。
The text was updated successfully, but these errors were encountered: