Quantization with lora weights #467

xinyual · 2023-12-06T07:31:43Z

I have a mistral model with lora weights. Is there any way I can quantization the whole model with lora weights?
I try this step but meet problems

model = MistralGPTQForCausalLM.from_pretrained(base_model, quantize_config)
model = PeftModel.from_pretrained(
        model,
        lora_weights,
        adapter_name = "dsl1"
            )
print("start quantize")
model.quantize(examples)

When I load and use do_sample to generate like:

with torch.no_grad():
    generation_output = model.generate(
    input_ids=input_ids,
    do_sample=True,
    temperature=0.01
    )

It calls RuntimeError: probability tensor contains either inf, nan or element < 0

The text was updated successfully, but these errors were encountered:

fxmarty · 2023-12-07T13:14:06Z

Hi, likely related: #295 & huggingface/transformers#27179

fxmarty · 2023-12-07T13:14:35Z

Could you provide a reproduction?

xinyual · 2023-12-13T07:50:11Z

Sorry for late reply.

tokenizer = AutoTokenizer.from_pretrained(base_model, use_fast=True)
examples = [
    tokenizer(
        prompt
    )
    ]
quantize_config = BaseQuantizeConfig(
    bits=4,  # quantize model to 4-bit
    group_size=128,  # it is recommended to set the value to 128
    desc_act=False,  # set to False can significantly speed up inference but the perplexity may slightly bad
)
model = MistralGPTQForCausalLM.from_pretrained(base_model, quantize_config)
model = PeftModel.from_pretrained(
        model,
        lora_weights
            )
model.quantize(examples)

Then:

model = MistralGPTQForCausalLM.from_quantized(quantized_model_dir, device="cuda:1")
with torch.no_grad():
    generation_output = model.generate(
    input_ids=input_ids,
    do_sample=True,
    top_k=top_k,
    top_p=top_p,
    max_length=2500 + 100,
    temperature=0.01
    )

fxmarty · 2023-12-13T16:31:00Z

Thank you! What is your base_model? Is there an already quantized model available on HF Hub for which we can reproduce maybe?

xinyual · 2023-12-14T00:44:18Z

It's mistralai/Mistral-7B-Instruct-v0.1 from huggingface.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization with lora weights #467

Quantization with lora weights #467

xinyual commented Dec 6, 2023

fxmarty commented Dec 7, 2023 •

edited

Loading

fxmarty commented Dec 7, 2023

xinyual commented Dec 13, 2023

fxmarty commented Dec 13, 2023 •

edited

Loading

xinyual commented Dec 14, 2023

Quantization with lora weights #467

Quantization with lora weights #467

Comments

xinyual commented Dec 6, 2023

fxmarty commented Dec 7, 2023 • edited Loading

fxmarty commented Dec 7, 2023

xinyual commented Dec 13, 2023

fxmarty commented Dec 13, 2023 • edited Loading

xinyual commented Dec 14, 2023

fxmarty commented Dec 7, 2023 •

edited

Loading

fxmarty commented Dec 13, 2023 •

edited

Loading