Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with BF16 models #22

Closed
anakin87 opened this issue Mar 24, 2024 · 3 comments
Closed

Issues with BF16 models #22

anakin87 opened this issue Mar 24, 2024 · 3 comments

Comments

@anakin87
Copy link

Hey... Thanks for the great work!

While trying to evaluate a BF16 model, I encountered an error in my runpod container:
"triu_tril_cuda_template" not implemented for 'BFloat16'. (pytorch/pytorch#101932)

Switching the image from runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel-ubuntu22.04
to runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 fixed the issue.

I'm reporting this for others who may have the same problem.
I don't know if it might make sense to update the Colab notebook and use a newer image or it might reveal other problems.

@mlabonne
Copy link
Owner

Thanks @anakin87! That's weird, I evaluate BF16 models all the time (like automerged models for example). Would you be able to reproduce this error with another BF16 model by any chance? Thanks a lot for the fix!

@anakin87
Copy link
Author

Thanks for the feedback.
Thinking about it more, it is probably due to the fact that I used pytorch 2.2.0 for training. 🙂

Feel free to close the issue.

@mlabonne
Copy link
Owner

Cool! I added it to the troubleshooting section, it might be helpful. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants