Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add twenty redundant data in post pretrain #8777

Merged

Conversation

JunnYu
Copy link
Member

@JunnYu JunnYu commented Jul 18, 2024

PR types

BUG

PR changes

APIs

Description

https://github.com/PaddlePaddle/PaddleNLP/pull/8776。
多加点样本,加20条吧,担心10的情况还会有超过索引。

Copy link

paddle-bot bot commented Jul 18, 2024

Thanks for your contribution!

@JunnYu JunnYu merged commit 157f7d3 into PaddlePaddle:release/2.8 Jul 18, 2024
4 of 5 checks passed
DesmonDay pushed a commit to DesmonDay/PaddleNLP that referenced this pull request Sep 5, 2024
* 给dataset再添加20条数据,防止blend dataset出现错误
DesmonDay added a commit that referenced this pull request Sep 5, 2024
* quick fix from pretrained. (#8487)

* quick fix os.path.split (#8508)

* Cp/fix (#8569)

* [Safetensors] Fix fast safe open slice. (#8512)
* [FIX DDP] fix ddp (#8549)

* [BUG] Fix build train valid test datasets (#8823)

* Update causal_dataset.py

* Add twenty redundant data in post pretrain (#8777)

* 给dataset再添加20条数据,防止blend dataset出现错误

* num_samples向下去整,防止数据集的溢出 (#8691)

* update release_grads (#8834)

* update release_grads (#8834)

* [Trainer] Fix release_grads (#9085)

* fix pp release_grads

* add dataloader_drop_last to evaldataloader (#8773)

* bugfix

* Fix eval hang (#9052)

* fix pipeline eval

* fix eval dataloader_num_workers

---------

Co-authored-by: Zhong Hui <[email protected]>
Co-authored-by: yujun <[email protected]>
Co-authored-by: gongel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants