使用vgg16、vgg19跑5分类花的数据loss不收敛、精度有问题，且怎么指定预训练模型。 #811

LegendSun0 · 2024-09-29T09:29:42Z

If this is your first time, please read our contributor guidelines:
https://github.com/mindspore-lab/mindcv/blob/main/CONTRIBUTING.md

Describe the bug/ 问题描述 (Mandatory / 必填)
使用vgg16、vgg19在GPU和NPU跑5分类花的数据loss不收敛、精度有问题。

Hardware Environment(Ascend/GPU/CPU) / 硬件环境:

Please delete the backend not involved / 请删除不涉及的后端:
/device ascend/GPU

Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 2.2.11) :
-- Python version (e.g., Python 3.9.18) :
-- OS platform and distribution (e.g., Linux Ubuntu 22.04):
-- GCC/Compiler version (if compiled from source):
Excute Mode / 执行模式 (Mandatory / 必填)(PyNative/Graph):

Please delete the mode not involved / 请删除不涉及的模式:
/mode pynative PYNATIVE_MODE(1)
/mode graph

To Reproduce / 重现步骤 (Mandatory / 必填)
Steps to reproduce the behavior:
使用yaml文件训练
命令：python train.py --config ./configs/vgg/vgg16_ascend.yaml

Expected behavior / 预期结果 (Mandatory / 必填)
A clear and concise description of what you expected to happen.

Screenshots/ 日志 / 截图 (Mandatory / 必填)
If applicable, add screenshots to help explain your problem.
yaml文件内容

system

mode: 1
distribute: False
num_parallel_workers: 8
val_while_train: True

dataset

dataset: 'imagenet'
data_dir: './imageNet'
shuffle: True
dataset_download: False
batch_size: 32
drop_remainder: True

augmentation

image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875

model

model: 'vgg16'
num_classes: 5
pretrained: True
ckpt_path: ''
keep_checkpoint_max: 1
ckpt_save_dir: './ckpt3'
epoch_size: 20
dataset_sink_mode: True
amp_level: 'O0'

loss

loss: 'CE'
label_smoothing: 0.1

lr scheduler

scheduler: 'warmup_cosine_decay'
lr: 0.01
min_lr: 0.0001
decay_epochs: 198
warmup_epochs: 2

optimizer

opt: 'momentum'
momentum: 0.9
weight_decay: 0.00004
loss_scale: 1024
use_nesterov: False

训练结果：
Epoch TrainLoss 1 1.659075 2 1.790772 3 1.747301 4 1.628069 5 1.661704 6 1.725484 7 1.674596 8 1.607921 9 1.670359 10 1.685464 11 1.688051 12 1.720397 13 1.750791 14 1.598438 15 1.609399 16 1.617299 17 1.744891 18 1.776682 19 1.670697 20 1.782085 Top_1_Accuracy Top_5_Accuracy TrainTime EvalTime TotalTime
25.2044% 100.0000% 22.04 0.99 27.67
19.0736% 100.0000% 6.21 0.84 10.10
19.0736% 100.0000% 6.46 0.84 10.10
19.0736% 100.0000% 6.18 0.78 9.68
19.0736% 100.0000% 6.33 0.85 10.33
19.0736% 100.0000% 6.19 0.85 10.06
18.9373% 100.0000% 6.40 0.89 10.36
19.0736% 100.0000% 6.25 0.75 10.25
19.0736% 100.0000% 6.17 0.80 10.14
19.0736% 100.0000% 6.22 0.87 10.75
19.0736% 100.0000% 6.41 0.83 10.23
19.0736% 100.0000% 6.22 0.78 10.54
19.0736% 100.0000% 6.29 0.79 10.29
19.0736% 100.0000% 6.18 0.83 9.85
19.0736% 100.0000% 6.14 0.84 9.81
19.0736% 100.0000% 6.17 0.95 10.13
19.0736% 100.0000% 6.23 0.86 10.30
19.0736% 100.0000% 6.18 0.83 9.81
19.0736% 100.0000% 6.12 0.93 10.03
19.0736% 100.0000% 6.36 0.83 10.14

Additional context / 备注 (Optional / 选填)
Add any other context about the problem here.
loss不收敛，精度也不对。麻烦看一下是什么问题；还有就是我把预训练模型下载下来了怎么进行指定？目前使用pretrained: True会自动下载且在固定位置，想问下怎么进行指定；

The text was updated successfully, but these errors were encountered:

LegendSun0 added the bug Something isn't working label Sep 29, 2024

LegendSun0 changed the title ~~使用vgg16、vgg19跑5分类花的数据loss不收敛、精度有问题。~~ 使用vgg16、vgg19跑5分类花的数据loss不收敛、精度有问题，且怎么指定预训练模型。 Sep 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用vgg16、vgg19跑5分类花的数据loss不收敛、精度有问题，且怎么指定预训练模型。 #811

使用vgg16、vgg19跑5分类花的数据loss不收敛、精度有问题，且怎么指定预训练模型。 #811

LegendSun0 commented Sep 29, 2024 •

edited

Loading

使用vgg16、vgg19跑5分类花的数据loss不收敛、精度有问题，且怎么指定预训练模型。 #811

使用vgg16、vgg19跑5分类花的数据loss不收敛、精度有问题，且怎么指定预训练模型。 #811

Comments

LegendSun0 commented Sep 29, 2024 • edited Loading

system

dataset

augmentation

model

loss

lr scheduler

optimizer

LegendSun0 commented Sep 29, 2024 •

edited

Loading