We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
现象1:deepseek-moe模型在进行lora微调训练时loss值会突然变为0一直到最后,导致推理异常,输出结果为:!!!。
现象2:deepseek-moe模型在checkpoint模型基础上进一步lora微调训练,会报错。 需要将trainer.train(resume_from_checkpoint = resume_from_checkpoint_dir)改为: trainer.train() 才会启动成功。但保存的checkpoint就会从头开始,而不是从原checkpoint模型开始。
期待回复,谢谢~
The text was updated successfully, but these errors were encountered:
请问报错结果是什么呢,resume需要加载lora的adapter
Sorry, something went wrong.
从基础模型开始lora训练时不会报错,推理也不报错,就是loss值会在1个epoch后突然变为0,微调后模型推理返回结果是一堆感叹号。但如果lora微调想在resume上加载lora的adapter,使用trainer.train(resume_from_checkpoint = resume_from_checkpoint_dir)训练会报错额。
请问报错结果是什么呢,resume需要加载lora的adapter 从基础模型开始lora训练时不会报错,推理也不报错,就是loss值会在1个epoch后突然变为0,微调后模型推理返回结果是一堆感叹号。但如果lora微调想在resume上加载lora的adapter,使用trainer.train(resume_from_checkpoint = resume_from_checkpoint_dir)训练会报错额。
请问你这个问题解决了吗
No branches or pull requests
现象1:deepseek-moe模型在进行lora微调训练时loss值会突然变为0一直到最后,导致推理异常,输出结果为:!!!。
现象2:deepseek-moe模型在checkpoint模型基础上进一步lora微调训练,会报错。
需要将trainer.train(resume_from_checkpoint = resume_from_checkpoint_dir)改为:
trainer.train() 才会启动成功。但保存的checkpoint就会从头开始,而不是从原checkpoint模型开始。
期待回复,谢谢~
The text was updated successfully, but these errors were encountered: