You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 15, 2024. It is now read-only.
Hey DeiT team! Thanks so much for open sourcing the codebase for the DeiT paper! I was trying to reproduce results for the deit_base_patch16_224 model to see training curves and play around with hyper parameters, but I noticed that my job failed once it reached Epoch 8. I tried re-running 3 times, but it always died with the same error at the same point (Epoch 8 650/1250 batches).
To run, I followed the steps from the repository and ran the following command: python run_with_submitit.py --model deit_base_patch16_224 --use_volta32
I thought it could be a memory issue, but I see that the log prints: use_volta32=True
Hi @molocule ,
Thanks for your question, do you have more details about the error in the logs of other GPUs?
If the error is related to the presence of a NaN, the information in this issue may be useful
Best,
Hugo
Hey DeiT team! Thanks so much for open sourcing the codebase for the DeiT paper! I was trying to reproduce results for the deit_base_patch16_224 model to see training curves and play around with hyper parameters, but I noticed that my job failed once it reached Epoch 8. I tried re-running 3 times, but it always died with the same error at the same point (Epoch 8 650/1250 batches).
To run, I followed the steps from the repository and ran the following command:
python run_with_submitit.py --model deit_base_patch16_224 --use_volta32
I thought it could be a memory issue, but I see that the log prints:
use_volta32=True
I recieve this error:
Any help would be appreciated!! Thank you!
The text was updated successfully, but these errors were encountered: