You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 15, 2024. It is now read-only.
Is the batch_size flag the batch size per GPU or the total batch size across all GPUs? In the example training command, you use 4 GPUs and a batch size of 256. Does this mean the effective batch size is 1024 or 256 with 64 per GPU? I am unable to reproduce the DeiT-Ti results (~62.5% @ 250 epochs, I highly doubt it will hit 72% @ 300 epochs) using either 8 GPUs and batch_size=128 or 4 GPUs and batch_size=256. I was under the assumption that both would give me identical results equivalent to a batch size of 1024, but it seems like something is broken here.
The text was updated successfully, but these errors were encountered:
Is the batch_size flag the batch size per GPU or the total batch size across all GPUs? In the example training command, you use 4 GPUs and a batch size of 256. Does this mean the effective batch size is 1024 or 256 with 64 per GPU? I am unable to reproduce the DeiT-Ti results (~62.5% @ 250 epochs, I highly doubt it will hit 72% @ 300 epochs) using either 8 GPUs and batch_size=128 or 4 GPUs and batch_size=256. I was under the assumption that both would give me identical results equivalent to a batch size of 1024, but it seems like something is broken here.
The text was updated successfully, but these errors were encountered: