Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do you train imagenet with batch size 64 too? It goes out of memory on 1080 Ti. #17

Open
yxchng opened this issue Nov 12, 2018 · 2 comments

Comments

@yxchng
Copy link

yxchng commented Nov 12, 2018

No description provided.

@weitingyuk
Copy link

@yxchng How is your training result?

@jianghaojun
Copy link

jianghaojun commented Aug 3, 2020

I set the batchsize=256/lr=0.1, but the training result(top1-acc: 77.64) is much lower than paper reported(top1-acc: 78.24)! More details about hyperparameters are listed as below. The epoch setting is converted from the iteration which is mentioned in paper. If we set the batchsize as 256, then there is 5k iteration in 1 epoch. According to the paper, we should decay the learning rate at 200k/5k=40, 400k/5k=80, 500k/5k=100 epoch, and terminate training at 530/5k=106 epoch.

The learning rate is divided by 10 at 200k, 400k, 500k iterations. We terminate training at 530k iterations.

Hyperparameter settings

args.epochs = 106
args.batch_size = 256
### data transform
args.autoaugment = False
args.colorjitter = False
args.change_light = True
### optimizer
args.optimizer = 'SGD'
args.lr = 0.1
args.momentum = 0.9
args.weigh_decay_apply_on_all = True  # TODO: weight decay apply on which params
args.weight_decay = 1e-4
args.nesterov = True
### criterion
args.labelsmooth = 0
### lr scheduler
args.scheduler = 'uneven_multistep'
args.lr_decay_rate = 0.1
args.lr_milestone = [40, 80, 100]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants