Different batch_size results in different evaluation loss. #710

iminfine · 2024-07-25T07:52:15Z

I have modified the code in train_gpt2.py to enable loading a pre-trained model and conveniently obtaining its evaluation loss by inserting a break statement at the beginning of the training loop:

 if args.init_from =='eval' :  
    break

after

llm.c/train_gpt2.py

Line 762 in 72698a5

print0(f"val loss {val_loss}")

I have unexpectedly observed that when the evaluation batch size is varied from the size used during training, the evaluation loss also varies. This observation is counterintuitive. The evaluation loss should remain consistent regardless of the evaluation batch size. Does any one know why?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different batch_size results in different evaluation loss. #710

Different batch_size results in different evaluation loss. #710

iminfine commented Jul 25, 2024

Different batch_size results in different evaluation loss. #710

Different batch_size results in different evaluation loss. #710

Comments

iminfine commented Jul 25, 2024