We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llamafactory
[INFO|trainer.py:2319] 2024-10-25 08:06:23,313 >> Total train batch size (w. parallel, distributed & accumulation) = 8 [INFO|trainer.py:2320] 2024-10-25 08:06:23,313 >> Gradient Accumulation steps = 8 [INFO|trainer.py:2321] 2024-10-25 08:06:23,313 >> Total optimization steps = 234 [INFO|trainer.py:2322] 2024-10-25 08:06:23,318 >> Number of trainable parameters = 9,232,384 0%| | 0/234 [00:00<?, ?it/s]
一直停在这个信息,不动,系统是centos8.5,docker是25.0.1
No response
The text was updated successfully, but these errors were encountered:
哪个模型呢?gpu有在运行吗?
Sorry, something went wrong.
qwen2.5-1.5b,gpu也没动
你好,请问你解决了吗?我测试llava也遇到了同样的问题,在dpo时无法训练,但在sft阶段是正常的。显存有占用,但GPU没有运行。
没有解决,现在只能判断是centos的问题。因为在ubuntu下是正常的
No branches or pull requests
Reminder
System Info
llamafactory
version: 0.8.3Reproduction
[INFO|trainer.py:2319] 2024-10-25 08:06:23,313 >> Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:2320] 2024-10-25 08:06:23,313 >> Gradient Accumulation steps = 8
[INFO|trainer.py:2321] 2024-10-25 08:06:23,313 >> Total optimization steps = 234
[INFO|trainer.py:2322] 2024-10-25 08:06:23,318 >> Number of trainable parameters = 9,232,384
0%| | 0/234 [00:00<?, ?it/s]
一直停在这个信息,不动,系统是centos8.5,docker是25.0.1
Expected behavior
No response
Others
No response
The text was updated successfully, but these errors were encountered: