Have you met memory leak problem when running model？ #9

Sucran · 2018-10-08T11:36:48Z

Hi， @Ugness
I met a RAM memory leak problem when running network.py and train.py， this issue confused me for a few days. I have run other pytorch repo which is OK.
I run the code in Ubuntu 14.04, Pytorch 0.4.1, CUDA 8.0, cudnn 6.0.

Ugness · 2018-10-08T14:36:13Z

Yes I also met memory leak problem.
In my case, my GPU's VRAM is 11 GB, and It spends about 9GB of VRAM for batch size 1. (in paper, batchsize 10 is recommended but I could not train the model with that option)
(for 3 * 3 convolution version)

May be 1 * 1 convolution version(branch adjusted) would spend less memory.

I think that memory leak comes from local Picanet's implementation. It makes H * W * C Tensor to H * W number of 14 * 14 * C size patches. If you have better idea to implement local Picanet, please comment here or make a pull request.

Since I am not the author of paper, this code is not the best implementation. I'm sorry for that.

Sucran · 2018-10-08T16:05:53Z

Yes, I noticed the batch size option, it is wired and strange. I have no better idea so far, but I hope for further discussion. This week, I will go through the caffe code from author and see the difference in implementation in pytorch and caffe, going deeper in local PiCANet and global PiCANet.

Ugness · 2018-10-08T23:39:52Z

Can you give me the link of caffe implementation? I didn't know that. Thanks.

Sucran · 2018-10-09T00:23:17Z

@Ugness https://github.com/nian-liu/PiCANet, with deeplab caffe version.

Ugness · 2018-10-09T00:46:50Z

Thanks a lot.

Sucran · 2018-10-09T03:07:22Z

@Ugness I change the PiCANet config from ‘GGLLL’ to ‘GGGGG’ and ‘LLLLL’, both of them have memory leak problem when running network.py, have you met this before? I also found an interesting implementation of authors caffe code, they seem implemented an attpooling function on their own proto cpp which support their global or local attention function like conv3d. Can you give me a hint on how you thinking about the conv3d processing?

Ugness · 2018-10-09T10:55:28Z

I think that would not work with 'GGGGG' or 'LLLLL'. I just tested with 'GGLLL' and other options may cause some tensor dimension error.

And I will check protocpp ASAP.
For my conv3d processing, it is not easy to describe with text only. :(
I will describe with text first, but if you need more information to help your understanding, I'll make some images ASAP.

Ugness · 2018-10-09T11:29:18Z

How Conv3d works?

Assumption

Lets say Image_batch's shape is (N x C x H x W).
and attention map( of each pixel position)'s shape is (h x w) (for global, h=H, w=W).
We have (H * W) number of attention maps.
Each attention map should be applied to each pixel's patch, every channel.
I think to make this process with for loop takes a lot of time
(I don't know how to use CUDA level for loop, and I heard that default for loop works on cpu),
so I tried to use pytorch's pre-implemented Convolution functions.

What's difference between convolution and 'PiCA' process?

Convolution applies same kernel to each patch(pixel location), but different kernels to each channel, sample(batch_size).
PiCA process applies same att_map to each channel, but different att_map to each patch(pixel location), sample(batch_size).

PiCA process with Conv3d (Main Idea of method)

On image side, my idea is send the dimension of batch and location to dim:1 (channel).

(1, NxHxW, C, 13, 13) -> for F.conv3d, each dimension means (batch, channel, depth, H, W)

On kernel side, my idea is each (1,1,7,7) kernel goes to (1,1,13,13) by using F.conv3d dilation option.
Then, F.conv3d will apply NxHxW number of kernels to NxHxW number of patchs.
It is possible by using groups option
Also, F.conv3d will across the depth dimension(C, dim:2) with same att_map.
Finally, the output is (1, NxHxW, C, 1, 1) attention applied feature map, so I can reshape it to (N, C, H, W)

I used same idea to local PiCANet.

Ugness · 2018-10-09T11:46:47Z

For conventional Pytorch's Conv3d
My use of Conv3d

Ugness · 2018-10-09T12:07:00Z

X_X
I thought Caffe is simillar to pytorch, but it wasn't.
I tried to read the code, but I can't. The only thing I can see is they used for loop.
If they used for loop for implementing PiCANet, for loop in python consumes a lot of time. without CUDA logic. And I don't know how to use CUDA for loop in python. T.T

Sucran · 2018-10-10T12:22:35Z

@Ugness I do not think they use loop for implementing PiCANet. They use im2col and col2im, which is torch.nn.Unfold and torch.nn.Fold in pytorch. I suppose Conv3d can be translated into a combination of several im2col + matric multiplication + col2im, but I still confused how to implement this, still working on it.
The memory leak problem we suffered seems caused by F.Conv3d, hoping next version would fix it.

Ugness · 2018-10-10T13:05:49Z

Thanks. I also try to convert conv3d operation to combination of matrix multiplication.

Ugness · 2018-10-14T07:58:04Z

@Sucran I think I can improve my model soon. There was no such function like torch.nn.Fold on pytorch 0.4.0 when I started this project. Now, I found the function that I need. Thanks.

Sucran · 2018-10-14T11:22:11Z

Oh, really？Amazing！ @Ugness You are such a genius.
Looking forward to your new version. Thanks for your work, again.

Ugness · 2018-10-14T12:38:11Z

Hi @Sucran I made a new logic!
You can check it on https://github.com/Ugness/PiCANet-Implementation/tree/Fold_Unfold
Now you can train PiCANet model with batch1, by using 3.5GB of VRAM.
I just started my training code, so I'll report the training result about next week!

Looks like it works!

Sucran · 2018-10-15T00:54:24Z

@Ugness Soooooo happy for it works! I check the branch of Fold_Unfold, the memory leak problem seems gone. The VRAM is also lower for increasing the batch size, but cannot be 10. I will check the channel setting of each layer by comparing the caffe version of the author, maybe there is something misunderstanding still exits.

Ugness · 2018-10-15T01:30:10Z

@Sucran Thanks a lot for your interest. It gave a lot of improvement. It seems like training speed is also improved.
About the version of code, Fold_Unfold version is branch of origin (33 conv) not the Adjusted(11 conv) one.
I am training this code with 3*3 conv, batch_size 4.
I am going to close this Issue after report the result. If you find some errors or need help, please open another issue. :)

Sucran · 2018-10-15T01:41:41Z

@Ugness Ok. Thanks for your work again. It is my pleasure.

Sucran · 2018-10-19T03:06:39Z

@Ugness Anything new?

Ugness · 2018-10-19T06:13:46Z

One of my model got about 88 on F-measure score with 200 samples of DUTS-TE which scored 87 with model in paper, So I am measuring score with all of DUTS-TE, on all of checkpoints. So it takes a little bit long time.

I ensure that new model(with bigger batch_size) performs much better.
I think I can update repo on Sunday or next Monday.

Ugness · 2018-10-21T06:02:41Z

I updated and merged branch.

Sucran · 2018-10-22T00:44:52Z

@Ugness So the result is the branch of origin (33 conv) not the Adjusted(11 conv) one? it seems to increase the performance of the author's version? The curve you plot is corresponding to training or validation?

Ugness · 2018-10-22T04:13:58Z

No, it's adjusted one. I used (1*1 conv).
Yes it seems making better performance. The curve is validation.

I think I need to check all of the code hardly. May be there is something wrong.

Sucran · 2018-10-27T14:04:00Z

@Ugness Hi, I try to reproduce your result, but I am confusing how to compute the metric result you reported. I had a trained weight model, but which code file contains the test part code?

Ugness · 2018-10-28T06:50:53Z

You can check the measuring code in pytorch/measure_test.py. It will report the result on tensorboard, and you can download csv from tensorboard.

Sucran · 2018-10-30T01:49:14Z

Hi @Ugness, do you check your test code for computing Max F_b and MAE, I think there are problems here.

The way of computing MAE which is different with MSE_Loss. It is torch.abs(y_pred - y).mean().
I do not familiar with scikit-learn, maybe the pr_curve computing is more efficient than handcraft one. but I got a different result, I ref the code of AceCoooool/DSS-pytorch, I think the problem can be here.
I using the trained model 36epo_383000step.ckpt and got a Max F_b as 0.877 for your code, but got 0.750 for AceCoooool's code.

Ugness · 2018-10-30T13:47:00Z

Ops. I found that MSE and MAE is not same. It's my mistake. I'll fix it.
I'll check how scipy measures F-beta score.
I used threshold to measure F-beta score, may be that was wrong.

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_curve.html#sklearn.metrics.precision_recall_curve

For example, if threshold=0.7 and predicted value=0.8, I made 0.8 to 1. As like as making PR-curve.
And I plot Max-F score on all threshold space.
If I did not use threshold, maybe it scored 0.75 as like you get.
Thank you for your comment. I also think 0.877 is strange. Because my attention map was different from author's one.

Sucran · 2018-10-31T09:17:25Z

@Ugness I do not think the scikit-learn API provided a correct way to compute max F-beta, but you can ref the paper "Salient Object Detection: A Survey" for Chapter 3.2. Usually, we have a fixed threshold which changes from 0 to 255, for binarizing the saliency map to compute precision and recall. F-beta is computed from the average precision and average recall of all images. Then we pick the maximum as max F-beta.

Ugness · 2018-11-20T01:29:52Z

And about your memory problem, how much VRAM and RAM do you have?
Where does the RAM problem occur? RAM or VRAM?

Sucran · 2018-11-20T16:38:53Z

@Ugness I think the F-score procedure code that you showed is correct. It is almost the same as I reported in 7 days ago, right? I just set the number of threshold as 100 and you set it as 256, which no cause too many differences, but the result still be 0.854 when you tested?
The most strange thing hit on my mind is the difference of MAE results. I always got a value of 0.65 but you got 0.54, we test it with the same code. Oh man, it is wired!

Ugness · 2018-11-21T04:41:05Z

I also think it is strange. And I have a few questions to compare our results.

Did you use all of DUTS-TE for testing?
How many images in your DUTS-TE folder? There was a few mismatching files which should be deleted in DUTS-TE.
Can you give me the threshold value which you used for the score 0.7803?
Did you explicitly round (convert to binary image) the mask images in DUTS-TE?
Did you use GPU for testing? (Did you use cuda mode with a single GPU?)
Thanks for sharing your results. It improves this project a lot.

Sucran · 2018-11-21T08:43:40Z

all DUTS-TE
I found this problem but these files do not cause a big difference.
Nope, I have no print this threshold. I will test it again.
Convert the mask images into range [0,1], this is done automatically.
Yes, single GPU for testing, cuda mode.

Ugness · 2018-11-21T09:22:32Z

Thank you for answering.
for 3., If you want, can you test with my threshold option? I already have it. It is 0.6627.

Sucran · 2018-11-23T06:52:38Z

@Ugness Sorry, the threshold I test is 0.8. I have no test your option yet, I need to wait for any available GPU in my lab.
I also have a problem, can you test the MAE result on all DUTS-TE without modifying the dataset? The difference of MAE result confused me.

Ugness · 2018-11-24T17:50:37Z

What do you mean by without modifying the dataset?
I am going to upload all the result (MAE, F-measure, threshold) on google drive.
I also upload the list of image file names in my DUTS-TE Dataset with it.

Sucran · 2018-11-28T02:33:53Z

@Ugness. I mean it should be 5019 images in DUTS-TE without deleting mismatching files, you should test on all 5019 images.

Ugness · 2018-11-28T09:14:01Z

While DUTS-TE-Mask has 2 more images than DUTS-TE-Image?
My DUTS-TE-Image folder has 5019 images. I deleted 2 images from DUTE-TE-Mask because there was 5021 images.

RaoHaobo · 2019-04-02T12:28:43Z

Hi, @Ugness , I intergrate your flie of measure.py and train.py ，but I don't change the file of network.py . now , I set the value of batch_size is 2, at the first falling of learning rate,my train loss can falling .but after that,althought my learning rate falling ,my train loss never falling. And , I test my model on PASCAL-S ,the best value of MAE is 0.1243.could you help me and sovle this problem?

Ugness · 2019-04-03T06:40:41Z

@RaoHaobo Can you give me some captures of your loss graph?? You may found it on Tensorboard.
And I think it is better to make new issue. Thanks.

RaoHaobo · 2019-04-03T08:32:38Z

I change the learning rate decrease by per 15000, but the case of train loss never falling is the same of your 7000. I results of train loss and learning rate as follows. Thanks!

Ugness · 2019-04-03T11:22:45Z

I think that graph looks fine.

But if you think that loss should be more less, I recommend you to increase lr decay rate and lr decay step. The hyper parameters on my code, I just followed the implementation on PiCANet paper with DUTS Dataset.
And about MAE, it may related to batch size. When I changed batchsize 1 to 10(may be 4 I do not remember correctly), the performance was incrementally increased.

I’ll let you know the specific value of score when I found the past results.
And I’ll also upload the loss graph of my experiment as well. Thanks.

RaoHaobo · 2019-04-04T07:31:20Z

I make the Ir decay rate from 0.1 to 0.5，the Ir decay steps is 7000.And my loss as follows

Why the train loss never falling after one opeoch？Do you meet the problem?

ghost · 2019-04-09T01:16:38Z

nice work and nice code!
When I run 'python train.py --dataset ./DUTS-TR', there occurs a error:(it seems something wrong with tensorboardX, but i have no idea what to do):

thanks for your reply~

RaoHaobo · 2019-04-09T01:22:50Z

@dylanqyuan version of your tensorboardX is too high.

ghost · 2019-04-09T01:31:15Z

@dylanqyuan version of your tensorboardX is too high.

It works! Thank you buddy!

Ugness · 2019-04-09T05:12:39Z

@RaoHaobo #16 (comment)
I’ve uploaded my graph on that link. And I also suggest you to follow that links 3 steps to check if model is trained or not.

My graph is also fluctuating as like as yours, and looks it is not decreasing.
And for your graph, I am concerned about the learning rate. I think it became too small to train the model effectively after 1 epoch. But I did not had such experiment about that, so it’s just my personal opinion.

If you want to check your models performance, I suggest you to follow the steps on the link.
If you are worrying about non-decreasing training loss, I suggest you (and also I) to have more experiments with learning rate and the other hyperparameters.
In detail,

make lr decay rate between 0.9~1 or decay step much larger.
Please carefully observe that if the model is trained enoughly on that lr.
Thank you for your interest and I hope you to have much higher performance than my experiments! :)

p.s. please comment at #17 if you want to talk about this issue more. To make easy to find!

RaoHaobo · 2019-05-05T05:29:33Z

@Ugness I test your the '36epo_383000step.ckpt' on PASCAL-S ,and the result is
，but your result is
,
why?
another problem:I add some my ideal on your code ,and I train my model is well:

but when I test my model on your this code ：measure_test.py ,the test result is

RaoHaobo · 2019-05-05T08:41:08Z

@Ugness the second problem have been solved ,the first isn't sovled

Ugness · 2019-05-07T09:11:50Z

Sorry. I forgot to mention that all of my experiment results are on DUTS dataset only. I updated my readme file.
If you got my result from the README.md, I trained and tested the model on ONLY DUTS Dataset. So the result on PASCAL-S dataset may differ.

RaoHaobo · 2019-05-10T03:53:44Z

@Ugness Ok，

this is your trained model ,and I use it to test on PASCAL-S and SOD ,the max_F is 0.8379.Could you test your model on other dataset?

RaoHaobo · 2019-05-13T12:33:34Z

@Ugness this code on your measure_test.py .

but the github.com/AceCoooool/DSS-pytorch solver.py is
,I think they have big different.

Ugness · 2019-05-13T16:05:51Z

I've made that .sum(dim=-1) because my code evaluates several images on parallel.
github.com/AceCoooool/DSS-pytorch solver.py calculates the prec / recall on a single image at once, and my code calculates all images at once.
The whole dimension of y_temp and mask is (batch, threshold_values, H, W).
If I execute .sum() like as below code, it would sum all values in y_temp. Although we should sum only on H and W axis.
And for the 1e-10, I made them for avoiding division by zero problem.
If you think my explanation is wrong, please give me your advice. Thanks.

RaoHaobo · 2019-05-14T00:53:10Z

@Ugness i mean that tp + 1e-10,the 1e-10 maybe take out ,I try to take out it ,but the max_F falling much.
I also use the Dss code to test your model on DUTS-TE,the result is bad.

Ugness · 2019-05-16T07:58:43Z

How much difference that follows from the error?
Is the difference significant?
Let me know it. Thanks.

RaoHaobo · 2019-05-16T08:03:59Z

when threhold equal to 1,the prec must be 0,but your result equal to 1

RaoHaobo · 2019-07-31T03:35:17Z

@Ugness
The writer.add_pr_curve() function in measure.py can't work , It never show in the tensorboard . I think it caused by the version tensorboard.

Ugness · 2019-08-01T09:59:50Z

https://github.com/tensorflow/tensorboard/releases
Would you try it with the tensorboard 1.8.0???

Ugness closed this as completed Oct 10, 2018

Ugness reopened this Oct 10, 2018

Ugness closed this as completed Jul 22, 2020

Have you met memory leak problem when running model？ #9

Have you met memory leak problem when running model？ #9

Comments

Sucran commented Oct 8, 2018 • edited Loading

Ugness commented Oct 8, 2018 • edited Loading

Sucran commented Oct 8, 2018

Ugness commented Oct 8, 2018

Sucran commented Oct 9, 2018

Ugness commented Oct 9, 2018

Sucran commented Oct 9, 2018

Ugness commented Oct 9, 2018 • edited Loading

Ugness commented Oct 9, 2018 • edited Loading

How Conv3d works?

Assumption

What's difference between convolution and 'PiCA' process?

PiCA process with Conv3d (Main Idea of method)

Ugness commented Oct 9, 2018

Ugness commented Oct 9, 2018

Sucran commented Oct 10, 2018

Ugness commented Oct 10, 2018

Ugness commented Oct 14, 2018

Sucran commented Oct 14, 2018

Ugness commented Oct 14, 2018 • edited Loading

Sucran commented Oct 15, 2018

Ugness commented Oct 15, 2018

Sucran commented Oct 15, 2018

Sucran commented Oct 19, 2018

Ugness commented Oct 19, 2018 • edited Loading

Ugness commented Oct 21, 2018

Sucran commented Oct 22, 2018

Ugness commented Oct 22, 2018 • edited Loading

Sucran commented Oct 27, 2018

Ugness commented Oct 28, 2018

Sucran commented Oct 30, 2018 • edited Loading

Ugness commented Oct 30, 2018 • edited Loading

Sucran commented Oct 31, 2018

Ugness commented Nov 20, 2018

Sucran commented Nov 20, 2018

Ugness commented Nov 21, 2018 • edited Loading

Sucran commented Nov 21, 2018

Ugness commented Nov 21, 2018

Sucran commented Nov 23, 2018

Ugness commented Nov 24, 2018

Sucran commented Nov 28, 2018

Ugness commented Nov 28, 2018 • edited Loading

RaoHaobo commented Apr 2, 2019

Ugness commented Apr 3, 2019

RaoHaobo commented Apr 3, 2019

Ugness commented Apr 3, 2019

RaoHaobo commented Apr 4, 2019

ghost commented Apr 9, 2019

RaoHaobo commented Apr 9, 2019

ghost commented Apr 9, 2019

Ugness commented Apr 9, 2019 • edited Loading

RaoHaobo commented May 5, 2019

RaoHaobo commented May 5, 2019

Ugness commented May 7, 2019 • edited Loading

RaoHaobo commented May 10, 2019

RaoHaobo commented May 13, 2019

Ugness commented May 13, 2019 • edited Loading

RaoHaobo commented May 14, 2019

Ugness commented May 16, 2019

RaoHaobo commented May 16, 2019

RaoHaobo commented Jul 31, 2019

Ugness commented Aug 1, 2019

Sucran commented Oct 8, 2018 •

edited

Loading

Ugness commented Oct 8, 2018 •

edited

Loading

Ugness commented Oct 9, 2018 •

edited

Loading

Ugness commented Oct 9, 2018 •

edited

Loading

Ugness commented Oct 14, 2018 •

edited

Loading

Ugness commented Oct 19, 2018 •

edited

Loading

Ugness commented Oct 22, 2018 •

edited

Loading

Sucran commented Oct 30, 2018 •

edited

Loading

Ugness commented Oct 30, 2018 •

edited

Loading

Ugness commented Nov 21, 2018 •

edited

Loading

Ugness commented Nov 28, 2018 •

edited

Loading

Ugness commented Apr 9, 2019 •

edited

Loading

Ugness commented May 7, 2019 •

edited

Loading

Ugness commented May 13, 2019 •

edited

Loading