参照gptq的文档对模型进行4bit量化报错了，如何将一个14b的checkpoints量化 #618

tc-yue · 2023-11-10T10:22:35Z

参照gptq的文档对14b的chat模型进行4bit量化报错了
代码：
model_4bit = AutoGPTQForCausalLM.from_pretrained(
filename, quantize_config,trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(filename, trust_remote_code=True)
examples = [tokenizer("auto")]
model_4bit.quantize(examples)

错误：
2023-11-10 18:20:24,426 - auto_gptq.modeling.base - INFO - Start quantizing layer 1/40
RuntimeError: cos must be on CUDA
File /opt/conda/lib/python3.8/site-packages/flash_attn/layers/rotary.py:50, in ApplyRotaryEmb.forward(ctx, x, cos, sin, inplace)
48 out = torch.empty_like(x) if not inplace else x
49 o1, o2 = out[..., :rotary_dim].chunk(2, dim=-1) if not inplace else (x1, x2)
---> 50 rotary_emb.apply_rotary(x1, x2, rearrange(cos[:seqlen], 's d -> s 1 d'),
51 rearrange(sin[:seqlen], 's d -> s 1 d'), o1, o2, False)
52 if not inplace and rotary_dim < headdim:
53 out[..., rotary_dim:].copy(x[..., rotary_dim:])

RuntimeError: cos must be on CUDA

版本：
auto-gptq 0.4.2+cu117
transformers 4.33.1

lonngxiang · 2023-11-21T07:55:59Z

我报错内容：
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

WingsLong · 2023-12-07T08:27:39Z

我报错内容： RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

大佬，解决了吗，有没有啥好办法

Dujianhua1008 · 2023-12-18T09:03:27Z

我报错内容： RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

参考这个 AutoGPTQ/AutoGPTQ#370

Dujianhua1008 · 2023-12-18T09:03:48Z

我报错内容： RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

大佬，解决了吗，有没有啥好办法

参考这个 AutoGPTQ/AutoGPTQ#370

jklj077 · 2023-12-26T04:28:35Z

我们已在README中补充了量化的说明，供参考哈。

jklj077 assigned JustinLin610 Nov 14, 2023

jklj077 mentioned this issue Dec 20, 2023

[BUG] <title> Lora 微调 14B 模型后，转GPTQ 量化模型报错 #824

Closed

2 tasks

jklj077 closed this as completed Dec 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

参照gptq的文档对模型进行4bit量化报错了，如何将一个14b的checkpoints量化 #618

参照gptq的文档对模型进行4bit量化报错了，如何将一个14b的checkpoints量化 #618

tc-yue commented Nov 10, 2023

lonngxiang commented Nov 21, 2023

WingsLong commented Dec 7, 2023

Dujianhua1008 commented Dec 18, 2023

Dujianhua1008 commented Dec 18, 2023

jklj077 commented Dec 26, 2023

参照gptq的文档对模型进行4bit量化报错了，如何将一个14b的checkpoints量化 #618

参照gptq的文档对模型进行4bit量化报错了，如何将一个14b的checkpoints量化 #618

Comments

tc-yue commented Nov 10, 2023

lonngxiang commented Nov 21, 2023

WingsLong commented Dec 7, 2023

Dujianhua1008 commented Dec 18, 2023

Dujianhua1008 commented Dec 18, 2023

jklj077 commented Dec 26, 2023