Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streaming模式下sft如果遇到损坏打不开的数据,如何跳过 #5817

Open
1 task done
Wiselnn570 opened this issue Oct 24, 2024 · 0 comments
Open
1 task done
Labels
pending This problem is yet to be addressed

Comments

@Wiselnn570
Copy link

Reminder

  • I have read the README and searched the existing issues.

System Info

目前在qwen2-vl上训练视频,只要有损坏的视频程序就会直接终止

Reproduction

FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen2vl_full_sft.yaml

报错在https://github.com/hiyouga/LLaMA-Factory/blob/b4c7dd3ac5615ccb52d7627db635d33336e51951/src/llamafactory/data/mm_plugin.py#L120这个函数

Expected behavior

程序能够跳过遍历下一个数据,或者是随机采样一个数据

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

1 participant