Fix quantization issue with transformers >= 4.36.0 #264

younesbelkada · 2023-12-14T18:14:07Z

Fixes #260

For some models, mainly models that use code on the Hub feature such as Qwen architecture, some target modules do not properly handle arguments such as past_key_value. I need to dig a bit though why this happens only on transformers 4.36.0 but this seems to work fine as a quick hotfix

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_path = 'Qwen/Qwen-7B-Chat'
quant_path = 'qwen-7b-awq'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

# Load model
# NOTE: pass safetensors=True to load safetensors
model = AutoAWQForCausalLM.from_pretrained(model_path, **{"low_cpu_mem_usage": True}, safetensors=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Quantize
model.quantize(tokenizer, quant_config=quant_config)

# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

print(f'Model is quantized and saved at "{quant_path}"')

cc @casper-hansen

dongkuang · 2023-12-15T04:49:20Z

i have the new error:Token indices sequence length is longer than the specified maximum sequence length for this model (57053 > 32768). Running this sequence through the model will result in indexing errors

casper-hansen · 2023-12-15T09:29:29Z

i have the new error:Token indices sequence length is longer than the specified maximum sequence length for this model (57053 > 32768). Running this sequence through the model will result in indexing errors

This is a warning and not an error. This is the intended usage of the tokenizer currently

dongkuang · 2023-12-16T04:10:27Z

OK!Thank you!Successfully processed

casper-hansen · 2023-12-16T11:22:25Z

This is a great fix @younesbelkada and very clean code! LGTM

fix quantization issue

5b9be04

younesbelkada requested a review from casper-hansen December 14, 2023 18:14

younesbelkada added 2 commits December 14, 2023 19:18

use local variable instead

bf7ca01

more fixes

28358fa

casper-hansen merged commit 2350a4d into main Dec 16, 2023

casper-hansen deleted the fix-issue-transformers-2 branch December 23, 2023 14:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix quantization issue with transformers >= 4.36.0 #264

Fix quantization issue with transformers >= 4.36.0 #264

younesbelkada commented Dec 14, 2023

dongkuang commented Dec 15, 2023

casper-hansen commented Dec 15, 2023

dongkuang commented Dec 16, 2023

casper-hansen commented Dec 16, 2023

Fix quantization issue with transformers >= 4.36.0 #264

Fix quantization issue with transformers >= 4.36.0 #264

Conversation

younesbelkada commented Dec 14, 2023

dongkuang commented Dec 15, 2023

casper-hansen commented Dec 15, 2023

dongkuang commented Dec 16, 2023

casper-hansen commented Dec 16, 2023