-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
参照gptq的文档对模型进行4bit量化报错了,如何将一个14b的checkpoints量化 #618
Comments
我报错内容: |
大佬,解决了吗,有没有啥好办法 |
|
|
我们已在README中补充了量化的说明,供参考哈。 |
参照gptq的文档对14b的chat模型进行4bit量化报错了
代码:
model_4bit = AutoGPTQForCausalLM.from_pretrained(
filename, quantize_config,trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(filename, trust_remote_code=True)
examples = [tokenizer("auto")]
model_4bit.quantize(examples)
错误:
2023-11-10 18:20:24,426 - auto_gptq.modeling.base - INFO - Start quantizing layer 1/40
RuntimeError: cos must be on CUDA
File /opt/conda/lib/python3.8/site-packages/flash_attn/layers/rotary.py:50, in ApplyRotaryEmb.forward(ctx, x, cos, sin, inplace)
48 out = torch.empty_like(x) if not inplace else x
49 o1, o2 = out[..., :rotary_dim].chunk(2, dim=-1) if not inplace else (x1, x2)
---> 50 rotary_emb.apply_rotary(x1, x2, rearrange(cos[:seqlen], 's d -> s 1 d'),
51 rearrange(sin[:seqlen], 's d -> s 1 d'), o1, o2, False)
52 if not inplace and rotary_dim < headdim:
53 out[..., rotary_dim:].copy(x[..., rotary_dim:])
RuntimeError: cos must be on CUDA
版本:
auto-gptq 0.4.2+cu117
transformers 4.33.1
The text was updated successfully, but these errors were encountered: