Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetune with deepspeed: type mismatch #35

Open
YeZiyi1998 opened this issue Jun 7, 2024 · 3 comments
Open

Finetune with deepspeed: type mismatch #35

YeZiyi1998 opened this issue Jun 7, 2024 · 3 comments

Comments

@YeZiyi1998
Copy link

I encountered an issue while finetune with the officially released code using the DeepSpeed. Here is the detailed error message:

File "/lib/python3.11/site-packages/deepspeed/runtime/zero/linear.py", line 57, in forward
output = input.matmul(weight.t())
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16

It appears that the matmul operation expects the two input tensors to have the same dtype. However, in my case, one of the tensors is of dtype float and the other is of dtype BFloat16.

I am not sure if this is a bug in the DeepSpeed library or an issue with my usage. I would appreciate any assistance in resolving this issue.

@lihaoling
Copy link

same question

@JensenDong
Copy link

same + 1

@yiyepiaoling0715
Copy link

I encountered the same problem, and here's how I solved it. modify lines 425 and 428 in the modelling_deepseek.py file and remove torch.float32, such as the following code

        logits = F.linear(
            hidden_states, self.weight, None
        )
      if self.scoring_func == "softmax":
            scores = logits.softmax(dim=-1)

I encountered an issue while finetune with the officially released code using the DeepSpeed. Here is the detailed error message:

File "/lib/python3.11/site-packages/deepspeed/runtime/zero/linear.py", line 57, in forward
output = input.matmul(weight.t())
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16

It appears that the matmul operation expects the two input tensors to have the same dtype. However, in my case, one of the tensors is of dtype float and the other is of dtype BFloat16.

I am not sure if this is a bug in the DeepSpeed library or an issue with my usage. I would appreciate any assistance in resolving this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants