Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about effective batch size #17

Open
GoldExcalibur opened this issue Dec 28, 2022 · 0 comments
Open

about effective batch size #17

GoldExcalibur opened this issue Dec 28, 2022 · 0 comments

Comments

@GoldExcalibur
Copy link

Thanks for your excellent work and released code.

  1. I have questions about the effective batch size, which is batch size 128 * accumulated_grad_batch 16= 2048.
    Does this mean the model see 128 sample a time and then calculates the gradient, then add all the gradients for each of 16 batches? I think such type of implementation differs from the normal concept of batch size 2048, where model sees 2048 sample at a time and the InfoNCE loss is computed over all 2048 samples, but not over 128 samples.
  2. Besides, I find that the precision is chosen to be 16 bit. I wonder why is it necessary to not use 32 bit.
  3. In src/models/base_model.py, I find that the warmup_epochs and max_epochs is rescaled by a factor of self.train_iters_per_epoch // self.config.num_of_mini_batch. Why is this rescaling necessary ? If this factor does not equal to 1, the max_epochs in learning rate scheduler does not equal to max_epochs in pl trainer, which I think is not quite reasonable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant