多卡训练报错 #42

KaiGod0730 · 2024-03-18T09:52:20Z

感谢您的工作！
我现在使用单卡训练没有问题，使用多卡训练会出现如下报错：
Traceback (most recent call last):
File "train_svd.py", line 1264, in
main()
File "train_svd.py", line 1045, in main
added_time_ids = _get_add_time_ids(
File "train_svd.py", line 949, in _get_add_time_ids
passed_add_embed_dim = unet.config.addition_time_embed_dim *
File "/.pt2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DistributedDataParallel' object has no attribute 'config'

我使用的命令：
accelerate launch train_svd.py
--pretrained_model_name_or_path=stable-video-diffusion-img2vid-xt-1-1
--per_gpu_batch_size=1 --gradient_accumulation_steps=1
--max_train_steps=100
--width=512
--height=320
--checkpointing_steps=50 --checkpoints_total_limit=1
--learning_rate=1e-5 --lr_warmup_steps=0
--seed=123
--mixed_precision="fp16"
--validation_steps=20
--num_workers=0 \

howardgriffin · 2024-05-28T09:22:56Z

Same error, how to solve the problem?

LTT-O · 2024-05-30T12:33:41Z

unet.config.addition_time_embed_dim加个module，unet.module.config.addition_time_embed_dim

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

多卡训练报错 #42

多卡训练报错 #42

KaiGod0730 commented Mar 18, 2024

howardgriffin commented May 28, 2024

LTT-O commented May 30, 2024

多卡训练报错 #42

多卡训练报错 #42

Comments

KaiGod0730 commented Mar 18, 2024

howardgriffin commented May 28, 2024

LTT-O commented May 30, 2024