-
Notifications
You must be signed in to change notification settings - Fork 280
TypeError: optimizers must be either a single optimizer or a list of optimizers. #18
Comments
Hi @guerriep, |
No activity so I close the issue. Feel free to re-open if you need further assistance |
I have encountered the same issue. Added prints in apex:
that line for me is also slightly different (lines 148-160): print("type(optimizers)", type(optimizers))
if isinstance(optimizers, torch.optim.Optimizer) or ('LARC' in sys.modules and isinstance(optimizers, LARC)):
print("isinstance LARC True")
optimizers = [optimizers]
elif optimizers is None:
optimizers = []
elif isinstance(optimizers, list):
optimizers_was_list = True
check_optimizers(optimizers)
else:
print("else hit")
check_optimizers([optimizers])
raise TypeError("optimizers must be either a single optimizer or a list of optimizers.") I have also checked Version numbers:
|
+1 facing the same issue, following this thread. |
NVIDIA/apex#978 is probably related. |
+1 facing the same issue, any idea how to solve? |
@mathildecaron31 Is it possible to re-open this issue as it appears to be affecting a number of people and is unresolved? Would you also be able to share version numbers for libraries in order to re-create your environment? |
I tested this code with:
|
Here is how I installed apex:
Hope that helps |
I have been able to get it to run with these specific versions now. Still a bit curious as to why it does not work with newer versions of apex. For others trying to replicate, these are my steps using anaconda and pip:
|
Thanks a lot! This worked for me. The specific version of apex seems like an important dependency for the code to run. It would be beneficial if this can be added to the Readme. |
I have not fix my bug, here AttributeError: module 'torch.distributed' has no attribute 'deprecated' I don't have other thought, who has check? please help me! Thanks u. |
To build on @John-P 's work, For building apex, make sure you have gcc > 5 and < 8. For example, the NVIDIA Docker container: nvidia/cuda:10.1-base has gcc v7.5, Ubuntu 18.04 and I was able to build apex successfully.
|
The comment above made it for me but the last line should be compiled in a cluster without sudo, only ubuntu 18..04 and nvidia-drivers 😉 |
Hi, can I compile the apex with cuda 11.1?
I'm working on RTX3090 and it only supports cuda version that is 11 or above. |
I found Mathilde's suggestion to require a slight change to ensure dependencies played nicely together. Use pip to install the checked out version of apex.
|
If anyone is getting error for "from torch._six import container_abcs" line 14 in "_amp_state.py" script of apex, you may replace that line with "import collections.abc as container_abcs" and it should work. |
您好!我已收到邮件,会尽快回复。
|
If you encounter this”from torch. _six import string_classes“ error reported line2 in "initialize.py"script of apex, please comment out this line of code and replace it with a ”string classes=str“ is sufficient. replace ”from torch._six import string_classes" with "string_classes = str". |
Hello,
I'm trying to run main_swav.py with the following command:
python -m torch.distributed.launch --nproc_per_node=1 main_swav.py --images_path=<path to data directory> --train_annotations_path <path to data file> --epochs 400 --base_lr 0.6 --final_lr 0.0006 --warmup_epochs 0 --batch_size 32 --size_crops 224 96 --nmb_crops 2 6 --min_scale_crops 0.14 0.05 --max_scale_crops 1. 0.14 --use_fp16 true --freeze_prototypes_niters 5005 --queue_length 3840 --epoch_queue_starts 15
Some of those parameters have been added to accommodate our data. The only changes I have made to the code are minor changes to the dataset and additional/changed arguments. When I run this command I get the following error:
`Traceback (most recent call last):
File "main_swav.py", line 380, in
main()
File "main_swav.py", line 189, in main
model, optimizer = apex.amp.initialize(model, optimizer, opt_level="O1")
File "/opt/conda/lib/python3.6/site-packages/apex/amp/frontend.py", line 358, in initialize
return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs)
File "/opt/conda/lib/python3.6/site-packages/apex/amp/_initialize.py", line 158, in _initialize
raise TypeError("optimizers must be either a single optimizer or a list of optimizers.")
TypeError: optimizers must be either a single optimizer or a list of optimizers.
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in
main()
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '-u', 'main_swav.py', '--local_rank=0', '--images_path=/data/computer_vision_projects/rare_planes/classification_data/images/', '--train_annotations_path', '/data/computer_vision_projects/rare_planes/classification_data/annotations/instances_train_role_mislabel_category_id_033_chipped.json', '--epochs', '400', '--base_lr', '0.6', '--final_lr', '0.0006', '--warmup_epochs', '0', '--batch_size', '32', '--size_crops', '224', '96', '--nmb_crops', '2', '6', '--min_scale_crops', '0.14', '0.05', '--max_scale_crops', '1.', '0.14', '--use_fp16', 'true', '--freeze_prototypes_niters', '5005', '--queue_length', '3840', '--epoch_queue_starts', '15']' returned non-zero exit status 1.
make: *** [Makefile:69: train-rare-planes] Error 1`
Immediately before the line that throws the error I placed a couple print statements:
print("type(OPTIMIZER)", type(optimizer)) print("OPTIMIZER", optimizer)
The output from those is:
type(OPTIMIZER) <class 'apex.parallel.LARC.LARC'> OPTIMIZER SGD ( Parameter Group 0 dampening: 0 lr: 0.6 momentum: 0.9 nesterov: False weight_decay: 1e-06 )
Here are some version numbers I'm using:
Python 3.6.9 :: Anaconda, Inc. PyTorch == 1.5.0a0+8f84ded torchvision == 0.6.0a0 CUDA == 10.2 apex == 0.1
Any ideas why I would be seeing this error? Thanks in advance!
The text was updated successfully, but these errors were encountered: