We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there any interface in the source code that would cause the loss is nan problem sometimes and the correct operation sometimes?
The text was updated successfully, but these errors were encountered:
"loss is nan" is a problem for original ViT models when amp is turned on. Check here for more details.
Afterwards, many techniques are proposed to solve this problem, e.g., LayerScale. You can refer these techniques for preventing training loss to nan.
Sorry, something went wrong.
No branches or pull requests
Is there any interface in the source code that would cause the loss is nan problem sometimes and the correct operation sometimes?
The text was updated successfully, but these errors were encountered: