You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have modified the code in train_gpt2.py to enable loading a pre-trained model and conveniently obtaining its evaluation loss by inserting a break statement at the beginning of the training loop:
I have unexpectedly observed that when the evaluation batch size is varied from the size used during training, the evaluation loss also varies. This observation is counterintuitive. The evaluation loss should remain consistent regardless of the evaluation batch size. Does any one know why?
The text was updated successfully, but these errors were encountered:
I have modified the code in train_gpt2.py to enable loading a pre-trained model and conveniently obtaining its evaluation loss by inserting a break statement at the beginning of the training loop:
after
llm.c/train_gpt2.py
Line 762 in 72698a5
I have unexpectedly observed that when the evaluation batch size is varied from the size used during training, the evaluation loss also varies. This observation is counterintuitive. The evaluation loss should remain consistent regardless of the evaluation batch size. Does any one know why?
The text was updated successfully, but these errors were encountered: