Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError when training with custom data #9384

Closed
2 tasks done
grantrosario opened this issue Sep 12, 2022 · 7 comments · Fixed by #9466
Closed
2 tasks done

TypeError when training with custom data #9384

grantrosario opened this issue Sep 12, 2022 · 7 comments · Fixed by #9466
Labels
bug Something isn't working

Comments

@grantrosario
Copy link

grantrosario commented Sep 12, 2022

Search before asking

  • I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

Training

Bug

Commit #9347 caused my Colab runs to start failing. They're unable to begin training due to the below type error. Conv2d is now expecting a different set of args.

image

Environment

Google colab

YOLOv5 🚀 v6.2-109-g23701ea Python-3.7.13 torch-1.12.1+cu113 CUDA:0 (Tesla P100-PCIE-16GB, 16281MiB)

Minimal Reproducible Example

!git clone https://github.com/ultralytics/yolov5/

%cd yolov5/

!pip install -r requirements.txt
!pip install gdown -q
!pip install ipython-autotime
!pip install ipyplot

import torch
import utils
import gdown
import yaml
import glob
import ipyplot

from utils.plots import plot_results

display = utils.notebook_init()

%load_ext autotime
!python train.py --imgsz 416 --image-weights --device 0 --batch-size 16 --epochs 1 --data /content/data.yaml --cfg /content/data.yaml --weights /content/yolov5/models/yolov5.pt --name s400_1

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@grantrosario grantrosario added the bug Something isn't working label Sep 12, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Sep 12, 2022

👋 Hello @grantrosario, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email [email protected].

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

@grantrosario thanks for the bug report!

I'm unable to reproduce an issue. I tried a related command with COCO128 training in Colab and everything works correctly with current master.

If you use Conv() modules directly in a custom model yaml you may need to update to the new arguments however.

Screenshot 2022-09-16 at 00 33 03

@River-Cold
Copy link

River-Cold commented Sep 18, 2022

Bug

I meet the same problem when I train COCO128 in Google Colab.
When I use the cfg "models/hub/yolov5s-ghost.yaml", it will throw an error as the following picture:
image
But everything works correctly when I use other cfgs.

Minimal Reproducible Example

!git clone https://github.com/ultralytics/yolov5 # clone
%cd yolov5
%pip install -qr requirements.txt # install

import torch
import utils
display = utils.notebook_init() # checks

#Train YOLOv5s on COCO128 for 3 epochs
!python train.py --img 640 --batch 16 --epochs 3 --cfg /content/yolov5/models/hub/yolov5s-ghost.yaml --data coco128.yaml --weights yolov5s.pt

@glenn-jocher
Copy link
Member

@River-Cold thanks for the code to reproduce. I see the same error. I'll add a TODO to resolve.

@glenn-jocher glenn-jocher added the TODO High priority items label Sep 18, 2022
glenn-jocher added a commit that referenced this issue Sep 18, 2022
Resolves #9384

Signed-off-by: Glenn Jocher <[email protected]>
glenn-jocher added a commit that referenced this issue Sep 18, 2022
Resolves #9384

Signed-off-by: Glenn Jocher <[email protected]>

Signed-off-by: Glenn Jocher <[email protected]>
@glenn-jocher
Copy link
Member

glenn-jocher commented Sep 18, 2022

@grantrosario good news 😃! Your original issue may now be fixed ✅ in PR #9466. To receive this update:

  • Gitgit pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
  • PyTorch Hub – Force-reload model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
  • Notebooks – View updated notebooks Open In Colab Open In Kaggle
  • Dockersudo docker pull ultralytics/yolov5:latest to update your image Docker Pulls

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

@glenn-jocher glenn-jocher removed the TODO High priority items label Sep 18, 2022
@River-Cold
Copy link

River-Cold commented Sep 19, 2022

@glenn-jocher I solve this problem but met another problem:#9059
It happens when I use yolov5s-ghost.yaml to train on coco128.
Between epoch 0 and 17,the P, R and mAP are all 0.0, just as the following picture:
image
I am not sure whether it is caused by the network itself or a bug.

@glenn-jocher
Copy link
Member

@River-Cold train 300 epochs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants