-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QKVFlashAttention unexpected parameters error, running in Google Colab #3
Comments
i also meet the same error, i guess this code base using a previously version of flash_attn.
|
Related minor issue, in th_evaluator.py, inception-2015-12-05.pt tries to download automatically but fails, and it doesn't seem like you can pass the path on the command line. Also is it supposed to automatically calculate stats a reference batch? (I'm probably trying to run the sample out of order?) class FIDAndIS: |
class FIDAndIS: |
The version of v1.0.2 has no device parameter. But v0.2.8 has device parameter. |
I use pip install flash-attn==0.2.8 solved it. |
After this procedure, I start training the model with these parameters and then an error came. Anyone know what it means? I'm a rookie for pytorch. (py3_8_16) bld@bld:~/consistency_models/scripts$ mpiexec -n 1 python cm_train.py --training_mode consistency_training --target_ema_mode adaptive --start_ema 0.95 --scale_mode progressive --start_scales 2 --end_scales 150 --total_training_steps 100000 --loss_norm lpips --lr_anneal_steps 0 --teacher_model_path /home/bld/pre_train_model/edm_bedroom256_ema.pt --attention_resolutions 32,16,8 --class_cond False --use_scale_shift_norm False --dropout 0.0 --teacher_dropout 0.1 --ema_rate 0.9999,0.99994,0.9999432189950708 --global_batch_size 1 --image_size 256 --lr 0.00005 --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --resblock_updown True --schedule_sampler uniform --use_fp16 True --weight_decay 0.0 --weight_schedule uniform --data_dir /home/bld/lsun/lsun_train_output_dir
Logging to /tmp/openai-2023-04-19-10-51-33-746807
creating model and diffusion...
creating data loader...
loading the teacher model from /home/bld/pre_train_model/edm_bedroom256_ema.pt
creating the target model
training...
/home/bld/anaconda3/envs/py3_8_16/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/home/bld/anaconda3/envs/py3_8_16/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG16_Weights.IMAGENET1K_V1`. You can also use `weights=VGG16_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
Traceback (most recent call last):
File "cm_train.py", line 171, in <module>
main()
File "cm_train.py", line 121, in main
CMTrainLoop(
File "/home/bld/consistency_models/cm/train_util.py", line 367, in run_loop
self.run_step(batch, cond)
File "/home/bld/consistency_models/cm/train_util.py", line 389, in run_step
self.forward_backward(batch, cond)
File "/home/bld/consistency_models/cm/train_util.py", line 501, in forward_backward
losses = compute_losses()
File "/home/bld/consistency_models/cm/karras_diffusion.py", line 191, in consistency_losses
distiller = denoise_fn(x_t, t)
File "/home/bld/consistency_models/cm/karras_diffusion.py", line 125, in denoise_fn
return self.denoise(model, x, t, **model_kwargs)[1]
File "/home/bld/consistency_models/cm/karras_diffusion.py", line 347, in denoise
model_output = model(c_in * x_t, rescaled_t, **model_kwargs)
File "/home/bld/anaconda3/envs/py3_8_16/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bld/anaconda3/envs/py3_8_16/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1156, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/home/bld/anaconda3/envs/py3_8_16/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0]) # type: ignore[index]
File "/home/bld/anaconda3/envs/py3_8_16/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bld/consistency_models/cm/unet.py", line 765, in forward
h = module(h, emb)
File "/home/bld/anaconda3/envs/py3_8_16/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bld/consistency_models/cm/unet.py", line 77, in forward
x = layer(x)
File "/home/bld/anaconda3/envs/py3_8_16/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bld/consistency_models/cm/unet.py", line 308, in forward
return checkpoint(
File "/home/bld/consistency_models/cm/nn.py", line 155, in checkpoint
return func(*inputs)
File "/home/bld/consistency_models/cm/unet.py", line 325, in _forward
h = checkpoint(self.attention, (qkv,), (), self.use_attention_checkpoint)
File "/home/bld/consistency_models/cm/nn.py", line 155, in checkpoint
return func(*inputs)
File "/home/bld/anaconda3/envs/py3_8_16/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bld/consistency_models/cm/unet.py", line 368, in forward
qkv, _ = self.inner_attn(
File "/home/bld/anaconda3/envs/py3_8_16/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bld/anaconda3/envs/py3_8_16/lib/python3.8/site-packages/flash_attn/flash_attention.py", line 47, in forward
output = flash_attn_unpadded_qkvpacked_func(
File "/home/bld/anaconda3/envs/py3_8_16/lib/python3.8/site-packages/flash_attn/flash_attn_interface.py", line 266, in flash_attn_unpadded_qkvpacked_func
return FlashAttnQKVPackedFunc.apply(qkv, cu_seqlens, max_seqlen, dropout_p, softmax_scale,
File "/home/bld/anaconda3/envs/py3_8_16/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/bld/anaconda3/envs/py3_8_16/lib/python3.8/site-packages/flash_attn/flash_attn_interface.py", line 58, in forward
out, softmax_lse, S_dmask = _flash_attn_forward(
File "/home/bld/anaconda3/envs/py3_8_16/lib/python3.8/site-packages/flash_attn/flash_attn_interface.py", line 21, in _flash_attn_forward
softmax_lse, *rest = flash_attn_cuda.fwd(
RuntimeError: Expected q.stride(-1) == 1 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[48068,1],0]
Exit code: 1
|
I got the same problem as well |
Had you solved this problem? I don't even know what the error message means. |
Solution:
|
I tried that, but it didn't solve the problem, were there any other changes you made? |
Since I'm running on V100, I also had to disable flash-attention (apparently it only works on A100)
|
@boxwayne @aarontan-git @asanakoy For the qkv = self.rearrange(
qkv, "b (three h d) s -> b s three h d", three=3, h=self.num_heads
)
# print(qkv.shape, qkv.stride())
qkv, _ = self.inner_attn(
qkv.contiguous(),
key_padding_mask=key_padding_mask,
need_weights=need_weights,
causal=self.causal,
) The print result is qkv=qkv.contiguous() Let me know if that solves the issue. I tested on my side and it works. |
I tried your fix, and got the following warning message when trying to run an imagenet consistency training:
|
@aarontan-git You could just leave this warning there if you don't care about it. If you want to fix this, you should check the codes and see which part involves the gradient stride change. And do the tensor storage stride modification to avoid the warning. |
I tried to generate samples in Colab and everything works except that I had to change this line of code in /cm/unet.py, clearing out factory_kwargs.
Not sure if this is a bug or I did something wrong. This is how I ran it: https://github.com/JonathanFly/consistency_models_colab_notebook/blob/main/Consistency_Models_Make_Samples.ipynb
The text was updated successfully, but these errors were encountered: