Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [1, 512, 8, 8]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. #175

Open
thompsondd opened this issue Aug 9, 2023 · 3 comments

Comments

@thompsondd
Copy link

I have run Nasbench101 in the zero-cost Naslib and got an error
image

Have anyone tackled this problem?

@thompsondd thompsondd changed the title RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [1, 512, 8, 8]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [1, 512, 8, 8]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Aug 9, 2023
@Neonkraft
Copy link
Collaborator

Hi @thompsondd,

Could you please tell us which proxy you were using? Looks to me like removing an inplace relu operation somewhere in the Nasbench101 graph will fix the issue.

Thanks!

@thompsondd
Copy link
Author

Thank you for your reply, @Neonkraft.

I am trying to use the Synflow proxy in NAS101 but the arch "(0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 4, 3, 3, 1)" raises this error (This is just one of the error cases that I have met).

Following the source code, I have removed the inplace of relu operation in https://github.com/automl/NASLib/blob/8c45f19dc259956c3bd253071135c798ad3df8ce/naslib/search_spaces/nasbench101/base_ops.py#L18C3-L19C10, but nothing changed.

Could you please tell me what you have modified the code?

@abhash-er
Copy link
Collaborator

Hi @thompsondd,

I have tried to reproduce your error and had no problem evaluating the zero-cost score for the architecture. Here's a snippet of code that I tried. You can correct me if it doesn't exactly match your case.

import logging 
from naslib.predictors import ZeroCost
from naslib import utils
from naslib.utils import setup_logger, get_dataset_api
from naslib.search_spaces.nasbench101.conversions import convert_tuple_to_spec
from naslib.search_spaces import NasBench101SearchSpace


config = utils.get_config_from_args(config_type="zc")
logger = setup_logger(config.save + "/log.log")
logger.setLevel(logging.INFO)

utils.set_seed(config.seed)
utils.log_args(config)

dataset_api = get_dataset_api("nasbench101", config.dataset)
graph = NasBench101SearchSpace(n_classes=10)
test_arch = (0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0,
             0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 
             0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1,
             0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 4, 3, 3, 1)
spec = convert_tuple_to_spec(test_arch)
graph.set_spec(spec)

predictor = ZeroCost(method_type="synflow")
train_loader, _, test_loader, _, _ = utils.get_train_val_loaders(config)
graph.parse()
score = predictor.query(graph, train_loader)
print("Zero cost score:", score)
logger.info('Test experiment complete.')

I had a synflow score of 125.99:

image

Is it possible that I missed something or a version-related problem? Maybe you can also try running the same snippet, and tell what you are getting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants