Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

参照gptq的文档对模型进行4bit量化报错了,如何将一个14b的checkpoints量化 #618

Closed
tc-yue opened this issue Nov 10, 2023 · 5 comments
Assignees

Comments

@tc-yue
Copy link

tc-yue commented Nov 10, 2023

参照gptq的文档对14b的chat模型进行4bit量化报错了
代码:
model_4bit = AutoGPTQForCausalLM.from_pretrained(
filename, quantize_config,trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(filename, trust_remote_code=True)
examples = [tokenizer("auto")]
model_4bit.quantize(examples)

错误:
2023-11-10 18:20:24,426 - auto_gptq.modeling.base - INFO - Start quantizing layer 1/40
RuntimeError: cos must be on CUDA
File /opt/conda/lib/python3.8/site-packages/flash_attn/layers/rotary.py:50, in ApplyRotaryEmb.forward(ctx, x, cos, sin, inplace)
48 out = torch.empty_like(x) if not inplace else x
49 o1, o2 = out[..., :rotary_dim].chunk(2, dim=-1) if not inplace else (x1, x2)
---> 50 rotary_emb.apply_rotary(x1, x2, rearrange(cos[:seqlen], 's d -> s 1 d'),
51 rearrange(sin[:seqlen], 's d -> s 1 d'), o1, o2, False)
52 if not inplace and rotary_dim < headdim:
53 out[..., rotary_dim:].copy
(x[..., rotary_dim:])

RuntimeError: cos must be on CUDA

版本:
auto-gptq 0.4.2+cu117
transformers 4.33.1

@lonngxiang
Copy link

我报错内容:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

@WingsLong
Copy link

我报错内容: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

大佬,解决了吗,有没有啥好办法

@Dujianhua1008
Copy link

我报错内容: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

参考这个 AutoGPTQ/AutoGPTQ#370

@Dujianhua1008
Copy link

我报错内容: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

大佬,解决了吗,有没有啥好办法

参考这个 AutoGPTQ/AutoGPTQ#370

@jklj077
Copy link
Contributor

jklj077 commented Dec 26, 2023

我们已在README中补充了量化的说明,供参考哈。

@jklj077 jklj077 closed this as completed Dec 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants