batch_size flag #220

tsengalb99 · 2023-05-26T18:34:47Z

Is the batch_size flag the batch size per GPU or the total batch size across all GPUs? In the example training command, you use 4 GPUs and a batch size of 256. Does this mean the effective batch size is 1024 or 256 with 64 per GPU? I am unable to reproduce the DeiT-Ti results (~62.5% @ 250 epochs, I highly doubt it will hit 72% @ 300 epochs) using either 8 GPUs and batch_size=128 or 4 GPUs and batch_size=256. I was under the assumption that both would give me identical results equivalent to a batch size of 1024, but it seems like something is broken here.

tsengalb99 · 2023-05-26T22:36:57Z

@TouvronHugo

roymiles · 2023-07-24T13:45:41Z

Were you able to solve this problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batch_size flag #220

batch_size flag #220

tsengalb99 commented May 26, 2023

tsengalb99 commented May 26, 2023

roymiles commented Jul 24, 2023

batch_size flag #220

batch_size flag #220

Comments

tsengalb99 commented May 26, 2023

tsengalb99 commented May 26, 2023

roymiles commented Jul 24, 2023