how to extract int8 weights from quantized model #1817

chensterliu · 2024-05-25T15:07:36Z

when loading the quantized model (smoothquant) with

from neural_compressor.utils.pytorch import load
qmodel = load(qmodel_path, model_fp)

I got
RecursiveScriptModule(original_name=QuantizationDispatchModule)
I'd like to extract those quantized int8 weight matrix, together with corresponding quantization parameter (scales, zero_points), what should I do?

The text was updated successfully, but these errors were encountered:

srinarayan-srikanthan · 2024-05-28T14:10:37Z

Hi @chensterliu , can you provide more details on the model that you quantized, the strategy and the version of neural_compressor and intel_extension_for_pytorch.

chensterliu · 2024-05-29T09:20:57Z

Hello, I used

neural_compressor             2.5.1
intel-extension-for-pytorch   2.3.0

for the smoothquant. What I've done is just running the script
neural compressor/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm/run_clm_no_trainer.py with arguments as follows:

    python -u run_clm_no_trainer.py \
        --model "facebook/opt-125m \
        --dataset "lambada" \
        --approach "static" \
        --output_dir "quan_out" \
        --quantize \
        --batch_size 16 \
         --ipex --int8_bf16_mixed --sq --alpha 0.5

I got the quan_out dir with 2 files inside: best_configure.json and best_model.pt successfully.

My question is how to get quantized int8 weight matrix from those files? The method in my first post doesn't work as the loaded qmodel is a RecursiveScriptModule. It seems a compiled product that can do inference but weights can't be retrieved as state_dict(). I appreciate if you could offer any method to obtain those quantized integers similar to named_parameters() of a normal toch.nn model.

srinarayan-srikanthan · 2024-06-05T22:25:18Z

Hi @chensterliu , I am able to run the command that you used to quantize and I am able to load the model using
from neural_compressor.utils.pytorch import load
qmodel = load("./saved_results")

The command i used to quantize:
python run_clm_no_trainer.py --dataset "lambada" --model facebook/opt-125m --quantize --batch_size 16 --sq --alpha 0.5 --ipex --output_dir "./saved_results" --int8_bf16_mixed

If you are still facing issues can you directly try to load the model using this : https://github.com/intel/neural-compressor/blob/29fdecbbb44ceb8d19c12809af90dc23063becfc/neural_compressor/utils/pytorch.py#L274C1-L281C57

chensterliu · 2024-06-06T13:10:47Z

Hi @srinarayan-srikanthan , loading the qmodel is fine. My problem is that the loaded qmodel doesn't bring any weights information to me. Please see the attached figure, do you also have this RecursiveScriptModule? How do you get int8 weights from the qmodel?

srinarayan-srikanthan · 2024-06-25T05:14:07Z

The torch.jit model is well packed for inference so you cannot unpack it and see its weight.

chensterliu · 2024-06-25T08:08:49Z

My goal is to extract those quantized int8 weights. Do you have workaround to achieve this? Or it is technically not possible.

srinarayan-srikanthan · 2024-06-26T02:57:14Z

Yes, can you try this workaround:

# Function to extract constants
def extract_constants(frozen_model):
    constants = {}
    for node in frozen_model.graph.nodes():
           if node.output().type().isSubtypeOf(torch._C.TensorType.get()):
                constant_name = node.output().debugName()
                constant_value = node.output().toIValue()
                constants[constant_name] = constant_value
    return constants

# Extract and print constants
constants = extract_constants(a) #your model
print("Freezed Model Constants:")
for name, value in constants.items():
    print(f"{name}: {value}")

chensterliu · 2024-06-27T15:10:32Z

thank you. The code works. The only subtile thing is that the printed names are index. which is difficult to trace back which tensor for which layer.

srinarayan-srikanthan self-assigned this May 28, 2024

srinarayan-srikanthan added the aitce AI TCE to handle it firstly label May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to extract int8 weights from quantized model #1817

how to extract int8 weights from quantized model #1817

chensterliu commented May 25, 2024

srinarayan-srikanthan commented May 28, 2024

chensterliu commented May 29, 2024

srinarayan-srikanthan commented Jun 5, 2024

chensterliu commented Jun 6, 2024

srinarayan-srikanthan commented Jun 25, 2024

chensterliu commented Jun 25, 2024

srinarayan-srikanthan commented Jun 26, 2024 •

edited

Loading

chensterliu commented Jun 27, 2024

how to extract int8 weights from quantized model #1817

how to extract int8 weights from quantized model #1817

Comments

chensterliu commented May 25, 2024

srinarayan-srikanthan commented May 28, 2024

chensterliu commented May 29, 2024

srinarayan-srikanthan commented Jun 5, 2024

chensterliu commented Jun 6, 2024

srinarayan-srikanthan commented Jun 25, 2024

chensterliu commented Jun 25, 2024

srinarayan-srikanthan commented Jun 26, 2024 • edited Loading

chensterliu commented Jun 27, 2024

srinarayan-srikanthan commented Jun 26, 2024 •

edited

Loading