Accept device and dtype in `OnnxConversion`, Add `OnnxBnb4Quantization`, llama2 e2e qlora example #703

jambayk · 2023-11-07T23:46:55Z

Describe your changes

OnnxConversion

The code has been refactored a bit so that the methods _convert_model_on_device and _convert_distributed_model_on_device have similar signature and behavior
Doc strings for the methods
The above also makes the logic for composite models simpler
New config parameters added to make the pass more flexible:
- use_device: users can specify what device to run conversion on. For example, there is enough gpu memory to speed up conversion or model is float16.
- torch_dtype: torch data type to cast the model to before conversion. For example, model is originally float16 but want to convert on cpu and run another pass later to convert to float16.
It is more aware about loading models using hf_config.
- Check and update torch_dtype
- Check if model is quantized using bitsandbytes.

OnnxBnb4Quantization

New quantization pass to quantize a model using nf4/fp4. Uses the MatMulBnb4 quantizer and contrib op
It can handle both pass config and model attributes such as quantized_modules and quantization_config
Using this pass, we usually don't want to quantize all MaMul nodes, only the ones that were originally quantized in the source model to retain accuracy

llama2

Add unit tests for this change.
Make sure all tests can pass.
Update documents if necessary.
Format your code by running pre-commit run --all-files
Is this a user-facing change? If yes, give a description of this change to be included in the release notes.
OnnxConversion pass has new parameters use_device and torch_dtype for more flexible conversion.
New OnnxBnb4Quantization to quantize an ONNX model using FP4/NF4 data types.

olive/common/utils.py

olive/model/__init__.py

olive/passes/onnx/bnb_quantization.py

examples/llama2/llama2_qlora.json

olive/passes/onnx/bnb_quantization.py

examples/llama2/llama2_qlora.json

olive/passes/onnx/bnb_quantization.py

olive/passes/onnx/quantization.py

jambayk added 5 commits November 7, 2023 21:19

add bnb4 quantizer and support quant conversion

b46b6cc

use_device and torch_dtype params

f29a281

add llama qlora e2e example

fb0285e

add missing import

8c82615

fix qlora test

dbf7bde