A possible solution to OOM in Metattack. #128

Leirunlin · 2023-02-05T10:50:56Z

Hi!
As mentioned in issues #90 and #127, OOM occurs when running Metattack in a higher version of Pytorch.
I check the source code in metattack.py and find that function ''get_adj_score()'' seems to be the reason.

DeepRobust/deeprobust/graph/global_attack/mettack.py

Lines 125 to 139 in 1c0ef07

    
           def get_adj_score(self, adj_grad, modified_adj, ori_adj, ll_constraint, ll_cutoff): 
        
               adj_meta_grad = adj_grad * (-2 * modified_adj + 1) 
        
               # Make sure that the minimum entry is 0. 
        
               adj_meta_grad -= adj_meta_grad.min() 
        
               # Filter self-loops 
        
               adj_meta_grad -= torch.diag(torch.diag(adj_meta_grad, 0)) 
        
               # # Set entries to 0 that could lead to singleton nodes. 
        
               singleton_mask = self.filter_potential_singletons(modified_adj) 
        
               adj_meta_grad = adj_meta_grad *  singleton_mask 
        
               if ll_constraint: 
        
                   allowed_mask, self.ll_ratio = self.log_likelihood_constraint(modified_adj, ori_adj, ll_cutoff) 
        
                   allowed_mask = allowed_mask.to(self.device) 
        
                   adj_meta_grad = adj_meta_grad * allowed_mask 
        
               return adj_meta_grad

I try substituting line 128 and 130 with explicit subtraction and it works fine to me to avoid OOM, that is using
adj_meta_grad = adj_meta_grad - adj_meta_grad.min()
and
adj_meta_grad = adj_meta_grad - torch.diag(torch.diag(adj_meta_grad, 0))
In fact, I found it is enough if only line 128 is replaced.
I think something goes wrong when "-=" and ".min()" are used together.
It would be really helpful for me if anyone could offer an explanation to it.

The text was updated successfully, but these errors were encountered:

pqypq · 2023-02-09T21:50:28Z

Hi! I've encountered the same problem that CUDA out of memory when using the following environment on Ubuntu 20.04.5 LTS:

numpy==1.21.6
scipy==1.7.3
torch==1.13.1
torch_geometric==2.2.0
torch_scatter==2.1.0+pt113cu116
torch_sparse==0.6.16+pt113cu116

I have already made the changes that you suggested. Could you please help me solve this problem?
Thanks

Leirunlin · 2023-02-10T04:01:58Z

Hi! I'm sorry that I don't know why the changes don't work in your environment.
Here, I will provide my environment and detailed steps that work for me:

numpy==1.23.4
scipy==1.9.3
torch==1.13.0
torch_geometric==2.2.0
torch_scatter==2.1.0
torch_sparse==0.6.15

I'm running the code on a GPU with 98GB memory. If no changes are made, I encounter CUDA out of memory just like you after generating two or three graphs against Metattack. (I guess there are gradients or something not removed from GPU.) After doing the changes I mentioned, the memory cost on dataset Cora is about 3000-4000M, which is now acceptable to me.
I wonder if it is possible for you to provide more information about your problem, like how did you make the changes and the memory cost in your cases.
Thanks.

pqypq · 2023-02-12T05:13:36Z

Hi, thanks for your reply!
I'm now working on the mettack on the graph data. I'm running the cora dataset on a GPU with 10.76GB memory.
I tried to replace

adj_meta_grad -= adj_meta_grad.min()
adj_meta_grad -= torch.diag(torch.diag(adj_meta_grad, 0))

to

adj_meta_grad = adj_meta_grad - adj_meta_grad.min()
adj_meta_grad = adj_meta_grad - torch.diag(torch.diag(adj_meta_grad, 0))

When the process comes to around 45%, I encountered the OOM error, the error message shows like below:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 24.00 MiB 
(GPU 1; 10.76 GiB total capacity; 9.90 GiB already allocated; 11.56 MiB free; 9.91 GiB reserved in total by PyTorch) 
If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  
See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Do you have any suggestions for me?

Thanks!

Leirunlin · 2023-02-12T08:35:28Z

Hi!
The phenomenon is quite similar to me when no change is made. The OOM occurs in the middle of training.
For you to debug, I suggest that you first check which part of Metattack lead to the problem in your environment. In issue #90, one mentioned that the inner_train() function could also be problematic. While the function works fine for me, I suggest that you check for it.
Anyway, I would try reproducing the bug and solution on other devices, but I'm afraid it may not be so fast.
Maybe we should have someone else provide more samples of the problem.

pqypq · 2023-02-14T21:41:26Z

Hi, thanks for your suggestions!

I tried to create an environment with the details below:

numpy==1.18.1
scipy==1.4.1
pytorch==1.8.0
torch_scatter==2.0.8
torch_sparse==0.6.12

Under these settings, I can successfully run the dataset: 'cora', 'cora_ml', 'citeseer', 'polblogs'. But for the dataset 'pubmed', still has the OOM problem. Have you ever encountered this problem?

Thanks!

Leirunlin · 2023-02-15T10:23:23Z

Hi!
You can try MetaApprox from mettack.py. It is an approximated version of Metattack. In ProGNN, PubMed is attacked using MetaApprox.
If it still does not work for you, maybe you should try more scalable attacks.

nowyouseemejoe · 2023-02-15T13:10:08Z

I found an efficient implementation in GreatX, hope it helps you.

pqypq · 2023-02-17T20:07:39Z

Hi @Leirunlin @nowyouseemejoe , thanks for your advice, I will try that!

ChandlerBang · 2023-02-26T05:40:01Z

Thank you all for the great discussion and suggestions! We are a bit shorthanded right now and you may want to directly make pull request if you found any bugs.

For the OOM issue, mettack is very memory consuming and we need a ~30 GB GPU to run it on Pubmed with MetaApprox. I have just added a scalable global attack PRBCD.

pip install deeprobust==0.2.7

You may want to try python examples/graph/test_prbcd.py or take a look at test_prbcd.py.

EnyanDai · 2023-03-12T06:46:04Z

Hi,
This problem can be solved by revising the Line 126 as:
adj_meta_grad = adj_grad.detach() * (-2 * modified_adj.detach() + 1)

Hi! As mentioned in issues #90 and #127, OOM occurs when running Metattack in a higher version of Pytorch. I check the source code in metattack.py and find that function ''get_adj_score()'' seems to be the reason.

DeepRobust/deeprobust/graph/global_attack/mettack.py

Lines 125 to 139 in 1c0ef07

def get_adj_score(self, adj_grad, modified_adj, ori_adj, ll_constraint, ll_cutoff):

adj_meta_grad = adj_grad * (-2 * modified_adj + 1)

# Make sure that the minimum entry is 0.

adj_meta_grad -= adj_meta_grad.min()

# Filter self-loops

adj_meta_grad -= torch.diag(torch.diag(adj_meta_grad, 0))

# # Set entries to 0 that could lead to singleton nodes.

singleton_mask = self.filter_potential_singletons(modified_adj)

adj_meta_grad = adj_meta_grad * singleton_mask

if ll_constraint:

allowed_mask, self.ll_ratio = self.log_likelihood_constraint(modified_adj, ori_adj, ll_cutoff)

allowed_mask = allowed_mask.to(self.device)

adj_meta_grad = adj_meta_grad * allowed_mask

return adj_meta_grad

I try substituting line 128 and 130 with explicit subtraction and it works fine to me to avoid OOM, that is using
adj_meta_grad = adj_meta_grad - adj_meta_grad.min()
and
adj_meta_grad = adj_meta_grad - torch.diag(torch.diag(adj_meta_grad, 0))
In fact, I found it is enough if only line 128 is replaced.
I think something goes wrong when "-=" and ".min()" are used together.
It would be really helpful for me if anyone could offer an explanation to it.

pqypq · 2023-03-12T22:07:30Z

Hi, This problem can be solved by revising the Line 126 as: adj_meta_grad = adj_grad.detach() * (-2 * modified_adj.detach() + 1)

Hi! As mentioned in issues #90 and #127, OOM occurs when running Metattack in a higher version of Pytorch. I check the source code in metattack.py and find that function ''get_adj_score()'' seems to be the reason.

DeepRobust/deeprobust/graph/global_attack/mettack.py

Lines 125 to 139 in 1c0ef07

def get_adj_score(self, adj_grad, modified_adj, ori_adj, ll_constraint, ll_cutoff):

adj_meta_grad = adj_grad * (-2 * modified_adj + 1)

# Make sure that the minimum entry is 0.

adj_meta_grad -= adj_meta_grad.min()

# Filter self-loops

adj_meta_grad -= torch.diag(torch.diag(adj_meta_grad, 0))

# # Set entries to 0 that could lead to singleton nodes.

singleton_mask = self.filter_potential_singletons(modified_adj)

adj_meta_grad = adj_meta_grad * singleton_mask

if ll_constraint:

allowed_mask, self.ll_ratio = self.log_likelihood_constraint(modified_adj, ori_adj, ll_cutoff)

allowed_mask = allowed_mask.to(self.device)

adj_meta_grad = adj_meta_grad * allowed_mask

return adj_meta_grad

I try substituting line 128 and 130 with explicit subtraction and it works fine to me to avoid OOM, that is using
adj_meta_grad = adj_meta_grad - adj_meta_grad.min()
and
adj_meta_grad = adj_meta_grad - torch.diag(torch.diag(adj_meta_grad, 0))
In fact, I found it is enough if only line 128 is replaced.
I think something goes wrong when "-=" and ".min()" are used together.
It would be really helpful for me if anyone could offer an explanation to it.

Hi Enyan,

Thank you for your suggestion! I tried to modify the code according to your way, but it still doesn't work on my device. May I ask how large is your GPU memory?

Thanks

Leirunlin mentioned this issue Feb 5, 2023

MetaAttack will OOM if PyTorch>1.9 #127

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A possible solution to OOM in Metattack. #128

A possible solution to OOM in Metattack. #128

Leirunlin commented Feb 5, 2023

pqypq commented Feb 9, 2023

Leirunlin commented Feb 10, 2023

pqypq commented Feb 12, 2023

Leirunlin commented Feb 12, 2023

pqypq commented Feb 14, 2023

Leirunlin commented Feb 15, 2023 •

edited

Loading

nowyouseemejoe commented Feb 15, 2023

pqypq commented Feb 17, 2023

ChandlerBang commented Feb 26, 2023 •

edited

Loading

EnyanDai commented Mar 12, 2023

pqypq commented Mar 12, 2023

A possible solution to OOM in Metattack. #128

A possible solution to OOM in Metattack. #128

Comments

Leirunlin commented Feb 5, 2023

pqypq commented Feb 9, 2023

Leirunlin commented Feb 10, 2023

pqypq commented Feb 12, 2023

Leirunlin commented Feb 12, 2023

pqypq commented Feb 14, 2023

Leirunlin commented Feb 15, 2023 • edited Loading

nowyouseemejoe commented Feb 15, 2023

pqypq commented Feb 17, 2023

ChandlerBang commented Feb 26, 2023 • edited Loading

EnyanDai commented Mar 12, 2023

pqypq commented Mar 12, 2023

Leirunlin commented Feb 15, 2023 •

edited

Loading

ChandlerBang commented Feb 26, 2023 •

edited

Loading