You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I didn't change the code and use both model.fit() and eager_tf to train the network.
For model.fit() the avg validation loss value is < 50 even in the first epoch. And the training loss value also goes < 50 in the beginning of the second epoch.
For eager_tf the validation loss stays at ~ 200 after 10 epochs, and the training loss decreases much slower, and goes to ~50 in the 10th epoch, which looks like overfitting.
This is the training result for model.fit():
Epoch 1:
1/358
Notice that here the losses are per-iteration losses and are not averaged.
ever since the first iteration, the loss values are much bigger than model.fit(), and at the end of epoch 1, the loss is >100, which is much worse compared with < 50 in model.fit().
I strictly follow the tutorial used for training and used the datasets / darknet model downloaded directly from the links provided.
I guess this might relate to the different process of loss functions.
Do you by any chance know why?
The text was updated successfully, but these errors were encountered:
You have to ensure these two methods print the same thing.First, It seems that your loss has not average. Second, their data batch is different,"model.fit()"maybe use "random batch", the other one use all the batch,not random,so, the difference is reasonable... : )
Hello!
I didn't change the code and use both model.fit() and eager_tf to train the network.
For model.fit() the avg validation loss value is < 50 even in the first epoch. And the training loss value also goes < 50 in the beginning of the second epoch.
For eager_tf the validation loss stays at ~ 200 after 10 epochs, and the training loss decreases much slower, and goes to ~50 in the 10th epoch, which looks like overfitting.
This is the training result for model.fit():
Epoch 1:
1/358
...
357/358
358/358
val_loss: 51.9096 - val_yolo_output_0_loss: 8.8620 - val_yolo_output_1_loss: 7.8781 - val_yolo_output_2_loss: 24.0912
Epoch 2:
1/358
Notice this sudden transition of training loss from 378 to 43 - this is because model.fit() reports the average among all the iterations in one batch.
This is the training result for eager_tf:
1_train_0, 155262.8125, [5675.242, 34116.484, 115460.375]
...
1_train_356, 523.5953369140625, [124.26721, 100.35405, 287.8407]
1_train_357, 125.0768814086914, [25.127472, 11.3394575, 77.47637]
1_val_0, 565.5044555664062, [86.86941, 158.40671, 309.0946]
...
1_val_363, 694.1661987304688, [114.45209, 213.89682, 354.6836]
(Average) 1, train: 5050.33447265625, val: 590.8134155273438
2_train_0, 788.0953369140625, [132.88559, 241.86014, 402.21585]
2_train_1, 493.3677978515625, [86.920746, 157.22601, 238.08711]
Notice that here the losses are per-iteration losses and are not averaged.
ever since the first iteration, the loss values are much bigger than model.fit(), and at the end of epoch 1, the loss is >100, which is much worse compared with < 50 in model.fit().
I strictly follow the tutorial used for training and used the datasets / darknet model downloaded directly from the links provided.
I guess this might relate to the different process of loss functions.
Do you by any chance know why?
The text was updated successfully, but these errors were encountered: