Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: input must be a CUDA tensor #566

Closed
QiuLL opened this issue Jul 2, 2024 · 9 comments
Closed

RuntimeError: input must be a CUDA tensor #566

QiuLL opened this issue Jul 2, 2024 · 9 comments
Assignees
Labels
bug Something isn't working stale

Comments

@QiuLL
Copy link

QiuLL commented Jul 2, 2024

python inference71.py configs/opensora-v1-2/inference/sample.py --num-frames 4s --resolution 720p --aspect-ratio 9:16 --prompt "a beautiful waterfall"
/data/miniconda/envs/opensora/lib/python3.10/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
/data/miniconda/envs/opensora/lib/python3.10/site-packages/transformers/utils/hub.py:142: FutureWarning: Using the environment variable HUGGINGFACE_CO_RESOLVE_ENDPOINT is deprecated and will be removed in Transformers v5. Use HF_ENDPOINT instead.
warnings.warn(
/data/miniconda/envs/opensora/lib/python3.10/site-packages/colossalai/pipeline/schedule/_utils.py:19: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_register_pytree_node(OrderedDict, _odict_flatten, _odict_unflatten)
/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/utils/_pytree.py:300: UserWarning: <class 'collections.OrderedDict'> is already registered as pytree node. Overwriting the previous registration.
warnings.warn(
/data/miniconda/envs/opensora/lib/python3.10/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
/data/miniconda/envs/opensora/lib/python3.10/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(

Traceback (most recent call last):
File "/data/qll/Open-Sora/inference71.py", line 303, in
main()
File "/data/qll/Open-Sora/inference71.py", line 265, in main
samples = scheduler.sample(
File "/data/qll/Open-Sora/opensora/schedulers/rf/init.py", line 52, in sample
model_args = text_encoder.encode(prompts)
File "/data/qll/Open-Sora/opensora/models/text_encoder/t5.py", line 192, in encode
caption_embs, emb_masks = self.t5.get_text_embeddings(text)
File "/data/qll/Open-Sora/opensora/models/text_encoder/t5.py", line 129, in get_text_embeddings
text_encoder_embs = self.model(
File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1975, in forward
encoder_outputs = self.encoder(
File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1110, in forward
layer_outputs = layer_module(
File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 694, in forward
self_attention_outputs = self.layer[0](
File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 600, in forward
normed_hidden_states = self.layer_norm(hidden_states)
File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/apex/normalization/fused_layer_norm.py", line 416, in forward
return fused_rms_norm_affine(
File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/apex/normalization/fused_layer_norm.py", line 215, in fused_rms_norm_affine
return FusedRMSNormAffineFunction.apply(*args)
File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/apex/normalization/fused_layer_norm.py", line 75, in forward
output, invvar = fused_layer_norm_cuda.rms_forward_affine(
RuntimeError: input must be a CUDA tensor

how to solve this?

@JThh
Copy link
Collaborator

JThh commented Jul 8, 2024

Can you set 'CUDA_VISIBLE_DEVICES=0' before the inference command, and if it does not help, print the device of input texts and T5 encoder model to see if they are on GPU already?

@JThh JThh added the bug Something isn't working label Jul 8, 2024
@JThh JThh self-assigned this Jul 8, 2024
@FrankLeeeee
Copy link
Contributor

FrankLeeeee commented Jul 10, 2024

Hi, did you build apex from source?

Copy link

This issue is stale because it has been open for 7 days with no activity.

@github-actions github-actions bot added the stale label Jul 18, 2024
@jacobswan1
Copy link

I encountered the same situation. I built the apex from source as instructed in installation.MD. Any clues for this? DId you resolve this?
@QiuLL

@github-actions github-actions bot removed the stale label Sep 15, 2024
Copy link

This issue is stale because it has been open for 7 days with no activity.

@github-actions github-actions bot added the stale label Sep 22, 2024
Copy link

This issue was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 30, 2024
@TalRemez
Copy link

Same problem. Any idea how to solve this?

@5-Jeremy
Copy link

I still get this error even after disabling the fused layernorm kernel (though with apex still installed).

@PeiqinSun
Copy link

Hi, guys, has any solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

7 participants