Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yolo v3 take a lot of time to train on custom data #1458

Closed
FlorianRuen opened this issue Aug 19, 2020 · 9 comments
Closed

Yolo v3 take a lot of time to train on custom data #1458

FlorianRuen opened this issue Aug 19, 2020 · 9 comments
Labels
question Further information is requested

Comments

@FlorianRuen
Copy link

❔Question

Hello everyone,

I'm using the code from this repo to train my model on images (around 12k images, labelled using labelbox in correct format), and there is around 17 classes.

I'm training my model on AWS EC2 instance (instance type is g3s.xlarge with Tesla M60 GPU and almost 8 gio video memory), but the training take a lot of time, and it's very hard to find why.

I'm explaining: I'm trying to make 500 epochs, and one epochs take around 25-30 minuts on this kind of instance. On my side, I think it's very long (my model isn't very big to take this time to train)

Hyperparameter was default one, I'm using batch size = 4 (> 4 look like to cause CUDA Out of Memore error), my test size is 20% of my 12k images.

What do you think about this ? Is it normal or to long ? If it's very long, any way to find why ?

Don't hesitate if I miss some data that can help

Kind regards,
Florian

@FlorianRuen FlorianRuen added the question Further information is requested label Aug 19, 2020
@glenn-jocher
Copy link
Member

glenn-jocher commented Aug 19, 2020

@FlorianRuen Ultralytics has open-sourced YOLOv5 at https://github.com/ultralytics/yolov5, featuring faster, lighter and more accurate object detection. YOLOv5 is recommended for all new projects.



** GPU Speed measures end-to-end time per image averaged over 5000 COCO val2017 images using a V100 GPU with batch size 32, and includes image preprocessing, PyTorch FP16 inference, postprocessing and NMS. EfficientDet data from [google/automl](https://github.com/google/automl) at batch size 8.
  • August 13, 2020: v3.0 release: nn.Hardswish() activations, data autodownload, native AMP.
  • July 23, 2020: v2.0 release: improved model definition, training and mAP.
  • June 22, 2020: PANet updates: new heads, reduced parameters, improved speed and mAP 364fcfd.
  • June 19, 2020: FP16 as new default for smaller checkpoints and faster inference d4c6674.
  • June 9, 2020: CSP updates: improved speed, size, and accuracy (credit to @WongKinYiu for CSP).
  • May 27, 2020: Public release. YOLOv5 models are SOTA among all known YOLO implementations.
  • April 1, 2020: Start development of future compound-scaled YOLOv3/YOLOv4-based PyTorch models.

Pretrained Checkpoints

Model APval APtest AP50 SpeedGPU FPSGPU params FLOPS
YOLOv5s 37.0 37.0 56.2 2.4ms 416 7.5M 13.2B
YOLOv5m 44.3 44.3 63.2 3.4ms 294 21.8M 39.4B
YOLOv5l 47.7 47.7 66.5 4.4ms 227 47.8M 88.1B
YOLOv5x 49.2 49.2 67.7 6.9ms 145 89.0M 166.4B
YOLOv5x + TTA 50.8 50.8 68.9 25.5ms 39 89.0M 354.3B
YOLOv3-SPP 45.6 45.5 65.2 4.5ms 222 63.0M 118.0B

** APtest denotes COCO test-dev2017 server results, all other AP results in the table denote val2017 accuracy.
** All AP numbers are for single-model single-scale without ensemble or test-time augmentation. Reproduce by python test.py --data coco.yaml --img 640 --conf 0.001
** SpeedGPU measures end-to-end time per image averaged over 5000 COCO val2017 images using a GCP n1-standard-16 instance with one V100 GPU, and includes image preprocessing, PyTorch FP16 image inference at --batch-size 32 --img-size 640, postprocessing and NMS. Average NMS time included in this chart is 1-2ms/img. Reproduce by python test.py --data coco.yaml --img 640 --conf 0.1
** All checkpoints are trained to 300 epochs with default settings and hyperparameters (no autoaugmentation).
** Test Time Augmentation (TTA) runs at 3 image sizes. Reproduce by python test.py --data coco.yaml --img 832 --augment

For more information and to get started with YOLOv5 please visit https://github.com/ultralytics/yolov5. Thank you!

@FlorianRuen
Copy link
Author

FlorianRuen commented Aug 20, 2020

Thanks for the link @glenn-jocher, I'm currently running a trainning using Yolo v5 for the same dataset
I will wait 1 or 2 hours to see the speed to training, and I'm coming back to you, to teel you if it's better or not

Thanks for your help

@FlorianRuen
Copy link
Author

@glenn-jocher To make a quick update on this topic, the training make around 10 epochs in 1h and 10 minutes

@glenn-jocher
Copy link
Member

@FlorianRuen sure, sounds fine.

@FlorianRuen
Copy link
Author

FlorianRuen commented Aug 20, 2020

@glenn-jocher Do you think the time taked is normal on this kind of machine ? For now, it reach epoch 78 in 9 hours and 48 minutes, so if the time for an epoch is stable, it should take around 40 hours for 300 epochs

Here is the charts from tensorboard (epoch 78 in 9h 48 min) => https://ibb.co/rw43zm1

Maybe I need to take a bigger machine (maybe with 16go video memory) to get it done faster (2x faster if the performances is x2 ?)

Thanks for your help

@glenn-jocher
Copy link
Member

@FlorianRuen this is not a question for me, just compare to publicly available environments like google colab.

@FlorianRuen
Copy link
Author

FlorianRuen commented Aug 21, 2020

@glenn-jocher I will try to search again, but any results I found run on only 3 epochs for COCO dataset on only 8 or 128 images, so the epochs is very fast in this case (I have around 700 images per epochs on my side, so if we make a comparation with this, on the public results 8 images in 9 seconds should be around 10 minutes for an epoch)

But if we use the results on you page, that said training on full COCO dataset:

Download COCO and run command below. Training times for YOLOv5s/m/l/x are 2/4/6/8 days on a single V100 (multi-GPU times faster). Use the largest --batch-size your GPU allows (batch sizes shown for 16 GB devices).

As COCO as 118k images for training and 5K for validation, my training is very low on just 12k images (even if I use 8 gio GPU instead of 16 gio)

@harshdhamecha
Copy link

Hey @FlorianRuen , I am facing the same problem with YOLOV3. Did you find any solutions yet?

Thanks

@glenn-jocher
Copy link
Member

glenn-jocher commented Nov 8, 2022

👋 Hello! Thanks for asking about training speed issues. YOLOv5 🚀 can be trained on CPU (slowest), single-GPU, or multi-GPU (fastest). If you would like to increase your training speed some options are:

  • Increase --batch-size
  • Reduce --img-size
  • Reduce model size, i.e. from YOLOv5x -> YOLOv5l -> YOLOv5m -> YOLOv5s
  • Train with multi-GPU DDP at larger --batch-size
  • Train on cached data: python train.py --cache (RAM caching) or --cache disk (disk caching)
  • Train on faster GPUs, i.e.: P100 -> V100 -> A100
  • Train on free GPU backends with up to 16GB of CUDA memory: Open In Colab Open In Kaggle

Good luck 🍀 and let us know if you have any other questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants