You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Steps: 100%|██████████████████████████████████████████████████████████| 315/315 [5:58:58<00:00, 68.13s/it, lr=2e-5, step_loss=0.000223The config attributes {'slice_compression_vae': True, 'use_tiling': True, 'mid_block_attention_type': '3d', 'mini_batch_encoder': 8, 'mini_batch_decoder': 2} were passed to AutoencoderKL, but are not expected and will be ignored. Please verify your config.json configuration file.
{'latents_std', 'latents_mean'} was not found in config. Values will be initialized to default values.
Loading pipeline components...: 0%| | 0/5 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/EasyAnimate/scripts/train_960.py", line 2437, in <module>
main()
File "/home/EasyAnimate/scripts/train_960.py", line 2426, in main
pipeline = EasyAnimatePipeline.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 119, in _inner_fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/pipeline_utils.py", line 881, in from_pretrained
loaded_sub_model = load_sub_model(
File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/pipeline_loading_utils.py", line 703, in load_sub_model
loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 119, in _inner_fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py", line 632, in from_pretrained
model = cls.from_config(config, **unused_kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/configuration_utils.py", line 260, in from_config
model = cls(**init_dict)
File "/usr/local/lib/python3.10/dist-packages/diffusers/configuration_utils.py", line 658, in inner_init
init(self, *args, **init_kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoders/autoencoder_kl.py", line 91, in __init__
self.encoder = Encoder(
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoders/vae.py", line 103, in __init__
down_block = get_down_block(
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/unets/unet_2d_blocks.py", line 249, in get_down_block
raise ValueError(f"{down_block_type} does not exist.")
ValueError: SpatialDownBlock3D does not exist.
Steps: 100%|██████████████████████████████████████████████████████████| 315/315 [5:59:01<00:00, 68.38s/it, lr=2e-5, step_loss=0.000223]
[2024-08-01 20:26:40,156] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 4376) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 46, in main
args.func(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1066, in launch_command
multi_gpu_launcher(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 711, in multi_gpu_launcher
distrib_run.run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 803, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 135, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
scripts/train_960.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-08-01_20:26:40
host : dpm4
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 4376)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
This is the last saved part. The pipeline should be replaced. This is a code error and will be modified in the next version.
We generally use checkpoint step to save the model.
Originally posted by @radna0 in #85 (comment)
The text was updated successfully, but these errors were encountered: