Help me, binary segmentation acc error! #2628

xiaoaxiaoxiaocao · 2023-02-21T15:43:25Z

The custom data set (modeled after the drive data set), the categories are foreground and background. Tannotation imgs value divided by 128 is equivalent to '1 if value >= 128 else 0', the training results are as follows， I don't know how to improve，help me, thanks!

2023-02-21 10:46:45,168 - mmseg - INFO - Iter [1700/20000]      lr: 9.240e-03, eta: 1:06:42, time: 0.211, data_time: 0.004, memory: 11397, decode.loss_ce: 0.0502, decode.acc_seg: 99.0194, aux.loss_ce: 0.0214, aux.acc_seg: 99.0194, loss: 0.0716
2023-02-21 10:46:55,700 - mmseg - INFO - Iter [1750/20000]      lr: 9.217e-03, eta: 1:06:27, time: 0.211, data_time: 0.004, memory: 11397, decode.loss_ce: 0.0315, decode.acc_seg: 99.5005, aux.loss_ce: 0.0128, aux.acc_seg: 99.5005, loss: 0.0444
2023-02-21 10:47:06,254 - mmseg - INFO - Iter [1800/20000]      lr: 9.195e-03, eta: 1:06:12, time: 0.211, data_time: 0.004, memory: 11397, decode.loss_ce: 0.0369, decode.acc_seg: 99.3797, aux.loss_ce: 0.0145, aux.acc_seg: 99.3797, loss: 0.0514
2023-02-21 10:47:16,802 - mmseg - INFO - Iter [1850/20000]      lr: 9.172e-03, eta: 1:05:58, time: 0.211, data_time: 0.004, memory: 11397, decode.loss_ce: 0.0535, decode.acc_seg: 99.0453, aux.loss_ce: 0.0209, aux.acc_seg: 99.0453, loss: 0.0744
2023-02-21 10:47:27,349 - mmseg - INFO - Iter [1900/20000]      lr: 9.150e-03, eta: 1:05:43, time: 0.211, data_time: 0.004, memory: 11397, decode.loss_ce: 0.0499, decode.acc_seg: 99.1083, aux.loss_ce: 0.0198, aux.acc_seg: 99.1083, loss: 0.0697
2023-02-21 10:47:37,877 - mmseg - INFO - Iter [1950/20000]      lr: 9.127e-03, eta: 1:05:29, time: 0.211, data_time: 0.004, memory: 11397, decode.loss_ce: 0.0223, decode.acc_seg: 99.6419, aux.loss_ce: 0.0096, aux.acc_seg: 99.6419, loss: 0.0319


+--------------+-------+-------+
|    Class     |  IoU  |  Acc  |
+--------------+-------+-------+
|  background  | 99.57 | 100.0 |
| Manipulation |  0.0  |  0.0  |
+--------------+-------+-------+
2023-02-21 10:52:53,215 - mmseg - INFO - Summary:
2023-02-21 10:52:53,215 - mmseg - INFO - 
+-------+-------+------+
|  aAcc |  mIoU | mAcc |
+-------+-------+------+
| 99.57 | 49.78 | 50.0 |
+-------+-------+------+

My config:

_base_ = [
    '../../../configs/_base_/models/fastfcn_r50-d32_jpu_psp.py', '../../../configs/_base_/datasets/manipulation.py',
    '../../../configs/_base_/default_runtime.py', '../../../configs/_base_/schedules/schedule_20k.py'
]

model = dict(
    decode_head=dict(num_classes=2,
                    out_channels=2,
                    loss_decode=dict(
                        type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)
                    ),
    auxiliary_head=dict(num_classes=2,
                        out_channels=2,
                        loss_decode=dict(
                            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)
                        ))

# dataset settings
dataset_type = 'ManipulationDataset' #change
data_root = '/home/featurize/data/manipulation'  

img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
crop_size = (320, 320)

MeowZheng · 2023-02-22T15:26:46Z

would you like to provide the full config about dataset setting?

xiaoaxiaoxiaocao · 2023-02-23T14:52:24Z

would you like to provide the full config about dataset setting?

# dataset settings
dataset_type = 'ManipulationDataset' #change
data_root = '/home/featurize/data/manipulation'   

img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
crop_size = (512, 512)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations'),
    dict(type='Resize', img_scale=(1280, 640), ratio_range=(0.5, 2.0)),
    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
    dict(type='RandomFlip', prob=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(2560, 640),
        # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
data = dict(
    samples_per_gpu=6,
    workers_per_gpu=4,
    train=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='images/training',
        ann_dir='annotations/training',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='images/validation',
        ann_dir='annotations/validation',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='images/validation',
        ann_dir='annotations/validation',
        pipeline=test_pipeline))

optimizer = dict(type='SGD', lr=0.005, momentum=0.9, weight_decay=0.0005)

xiexinch · 2023-02-24T03:33:15Z

Hi @xiaoaxiaoxiaocao,
It seems that your config has no problem, could you provide the full config and some example images?

xiaoaxiaoxiaocao · 2023-02-24T13:12:02Z

@xiexinch, the full config and some example images :
链接: https://pan.baidu.com/s/11MtqqT5LvgnoQhFhyrshEg 提取码: 8cqv 复制这段内容后打开百度网盘手机App，操作更方便哦
thanks for your help!

csatsurnh · 2023-03-01T09:04:38Z

Did you use the annotations in data_process for training? It seems that all the annotation images are pure black.

xiaoaxiaoxiaocao · 2023-03-01T09:14:33Z

Yes, I used the data of data_process for training. I refer to the drive dataset, and the annotation imgs(data_ori) value divided by 128 is equivalent to '1 if value >= 128 else 0'

csatsurnh · 2023-03-01T09:54:24Z

what do you mean by "divided by 128 is equivalent to '1 if value >= 128 else 0'"? Do you mind providing the complete preprocessing code used

xiaoaxiaoxiaocao · 2023-03-02T14:41:49Z

preprocessing code:

for i, file in enumerate(files):
img_path = os.path.join(root, file)
img = cv2.imread(img_path)
img_train_path = os.path.join(train_path, file)
cv2.imwrite(img_train_path, img[:, :, 0] // 128)

xiaoaxiaoxiaocao · 2023-03-05T02:42:45Z

@xiexinch @csatsurnh If I do not do the above data preprocessing, the following error will be reported:

2023-03-05 10:38:24,533 - mmseg - INFO - workflow: [('train', 1)], max: 80000 iters
2023-03-05 10:38:24,533 - mmseg - INFO - Checkpoints will be saved to /home/xiaojie/demo/mmsegmentation/work_dirs/deeplabv3_r50-d8_512x512_80k_mani_test2 by HardDiskBackend.
Traceback (most recent call last):
File "tools/train.py", line 242, in
main()
File "tools/train.py", line 238, in main
meta=meta)
File "/home/xiaojie/demo/mmsegmentation/mmseg/apis/train.py", line 194, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File "/home/xiaojie/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 144, in run
iter_runner(iter_loaders[i], **kwargs)
File "/home/xiaojie/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 70, in train
self.call_hook('after_train_iter')
File "/home/xiaojie/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 317, in call_hook
getattr(hook, fn_name)(self)
File "/home/xiaojie/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/runner/hooks/optimizer.py", line 65, in after_train_iter
runner.outputs['loss'].backward()
File "/home/xiaojie/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/_tensor.py", line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/xiaojie/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/autograd/init.py", line 175, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.

import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([2, 256, 40, 40], dtype=torch.float, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(256, 2, kernel_size=[1, 1], padding=[0, 0], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().float()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()

ConvolutionParams
data_type = CUDNN_DATA_FLOAT
padding = [0, 0, 0]
stride = [1, 1, 0]
dilation = [1, 1, 0]
groups = 1
deterministic = false
allow_tf32 = true
input: TensorDescriptor 0x52ef590
type = CUDNN_DATA_FLOAT
nbDims = 4
dimA = 2, 256, 40, 40,
strideA = 409600, 1600, 40, 1,
output: TensorDescriptor 0x52f9160
type = CUDNN_DATA_FLOAT
nbDims = 4
dimA = 2, 2, 40, 40,
strideA = 3200, 1600, 40, 1,
weight: FilterDescriptor 0x7f2df402d8a0
type = CUDNN_DATA_FLOAT
tensor_format = CUDNN_TENSOR_NCHW
nbDims = 4
dimA = 2, 256, 1, 1,
Pointer addresses:
input: 0x7f2dffc30000
output: 0x7f2ee0fd5800
weight: 0x7f2ee0f8b200

terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:1230 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f2fa724c7d2 in /home/xiaojie/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x239de (0x7f2fdfe319de in /home/xiaojie/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x22d (0x7f2fdfe3357d in /home/xiaojie/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: + 0x301898 (0x7f305c665898 in /home/xiaojie/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: c10::TensorImpl::release_resources() + 0x175 (0x7f2fa7235005 in /home/xiaojie/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #5: torch::autograd::SavedVariable::reset_data() + 0xa1 (0x7f2fe301eee1 in /home/xiaojie/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #6: + 0x2838bab (0x7f2fe292fbab in /home/xiaojie/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #7: + 0x2eef0a2 (0x7f2fe2fe60a2 in /home/xiaojie/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #8: torch::autograd::deleteNode(torch::autograd::Node*) + 0x7f (0x7f2fe2fe614f in /home/xiaojie/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #9: + 0x2ed0e67 (0x7f2fe2fc7e67 in /home/xiaojie/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #10: c10::TensorImpl::release_resources() + 0x20 (0x7f2fa7234eb0 in /home/xiaojie/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #11: + 0x1edf69 (0x7f305c551f69 in /home/xiaojie/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #12: + 0x4e5818 (0x7f305c849818 in /home/xiaojie/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #13: THPVariable_subclass_dealloc(_object*) + 0x299 (0x7f305c849b19 in /home/xiaojie/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #14: python() [0x4ac8e7]
frame #15: python() [0x4acafd]
frame #16: python() [0x4d5894]
frame #17: python() [0x4bbc68]
frame #18: python() [0x4d05bb]
frame #19: python() [0x4d05d1]
frame #20: python() [0x4d05d1]
frame #21: python() [0x4a1947]

frame #25: python() [0x5449b9]
frame #27: __libc_start_main + 0xf3 (0x7f3074609083 in /lib/x86_64-linux-gnu/libc.so.6)
frame #28: python() [0x54472e]

已放弃 (核心已转储)

xiexinch · 2023-03-14T08:33:20Z

preprocessing code:

for i, file in enumerate(files): img_path = os.path.join(root, file) img = cv2.imread(img_path) img_train_path = os.path.join(train_path, file) cv2.imwrite(img_train_path, img[:, :, 0] // 128)

Hi @xiaoaxiaoxiaocao,
It is possible that all pixels are labeled as background after this processing。

xiaoaxiaoxiaocao · 2023-03-14T09:41:48Z

@xiexinch

Why? I refer to the preprocessing of the drive dataset, and its preprocessing was done in this way

xiexinch · 2023-03-14T09:54:28Z

@xiexinch

Why? I refer to the preprocessing of the drive dataset, and its preprocessing was done in this way

We don't know whether your data is the same as drive, so there is no guarantee that drive's processing will work on your dataset as well.

xiexinch · 2023-03-16T07:23:11Z

Closing the issue, as there is no activity for a while.
We hope your issue has been resolved.
If not, please feel free to open a new one.

mm-assistant bot assigned xiexinch Feb 21, 2023

MeowZheng assigned csatsurnh Feb 22, 2023

MeowZheng added the awaiting response label Feb 22, 2023

xiexinch closed this as completed Mar 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help me, binary segmentation acc error! #2628

Help me, binary segmentation acc error! #2628

xiaoaxiaoxiaocao commented Feb 21, 2023 •

edited by xiexinch

Loading

MeowZheng commented Feb 22, 2023

xiaoaxiaoxiaocao commented Feb 23, 2023 •

edited by xiexinch

Loading

xiexinch commented Feb 24, 2023

xiaoaxiaoxiaocao commented Feb 24, 2023

csatsurnh commented Mar 1, 2023

xiaoaxiaoxiaocao commented Mar 1, 2023

csatsurnh commented Mar 1, 2023

xiaoaxiaoxiaocao commented Mar 2, 2023

xiaoaxiaoxiaocao commented Mar 5, 2023

xiexinch commented Mar 14, 2023

xiaoaxiaoxiaocao commented Mar 14, 2023

xiexinch commented Mar 14, 2023

xiexinch commented Mar 16, 2023

Help me, binary segmentation acc error! #2628

Help me, binary segmentation acc error! #2628

Comments

xiaoaxiaoxiaocao commented Feb 21, 2023 • edited by xiexinch Loading

MeowZheng commented Feb 22, 2023

xiaoaxiaoxiaocao commented Feb 23, 2023 • edited by xiexinch Loading

xiexinch commented Feb 24, 2023

xiaoaxiaoxiaocao commented Feb 24, 2023

csatsurnh commented Mar 1, 2023

xiaoaxiaoxiaocao commented Mar 1, 2023

csatsurnh commented Mar 1, 2023

xiaoaxiaoxiaocao commented Mar 2, 2023

xiaoaxiaoxiaocao commented Mar 5, 2023

xiexinch commented Mar 14, 2023

xiaoaxiaoxiaocao commented Mar 14, 2023

xiexinch commented Mar 14, 2023

xiexinch commented Mar 16, 2023

xiaoaxiaoxiaocao commented Feb 21, 2023 •

edited by xiexinch

Loading

xiaoaxiaoxiaocao commented Feb 23, 2023 •

edited by xiexinch

Loading