`ExtractAdapters`: Extract lora adapters and use them as model inputs or external initializers #1064

jambayk · 2024-04-09T16:46:32Z

Describe your changes

Introduce new pass ExtractAdapters which extracts the lora adapters (float or static quantized) weights and saves them in a separate file. The model graph is also modified in one of the following ways:
1. Adapters weights are set as external tensors pointing to a non-existent file. The onnx model is thus invalid by itself as it cannot be loaded. In order to create an inference session using this model, the adapter weights must be added to a sessions options object using add_initializer or add_external_initializers.
2. Adapter weights are converted into model inputs. The onnx model is valid. During inference, the adapter weights must be provided as part of the inputs. We call them constant inputs here since these weights don't change between runs when using the one set of adapters.
olive.scripts.export_adapters provided as a standalone script to directly export pre-existing adapter weights into the same format as ExtractAdapters pass.
OnnxDAG utils added to perform onnx graph manipulations. The methods are generic and can be used in other new passes.
ONNXModelHandler has two corresponding new attributes that must be names of files in the model_path directory:
1. external_initializers_file_name
2. constant_inputs_file_name
Both olive.common.ort_inference.get_ort_inference_session and OrtInferenceSession can handle the external initializers and constant inputs.

Checklist before requesting a review

Add unit tests for this change.
Make sure all tests can pass.
Update documents if necessary.
Lint and apply fixes to your code by running lintrunner -a
Is this a user-facing change? If yes, give a description of this change to be included in the release notes.
Is this PR including examples changes? If yes, please remember to update example documentation in a follow-up PR.

(Optional) Issue link

quant works too, name updated

- isolated env system update - external initializers test

doc

nit

olive/passes/onnx/conversion.py

olive/common/ort_inference.py

olive/model/handler/onnx.py

olive/passes/onnx/common.py

olive/passes/onnx/extract_adapters.py

examples/llama2/requirements-qlora.txt

olive/passes/onnx/onnx_dag.py

guotuofeng · 2024-04-10T03:23:29Z

the external_initializers_name description need to be updated.

olive/common/ort_inference.py

olive/model/handler/onnx.py

olive/passes/onnx/onnx_dag.py

guotuofeng · 2024-04-10T05:23:18Z

LGTM, I will let other guys put comments.

trajepl · 2024-04-10T08:16:12Z

olive/passes/onnx/extract_adapters.py

+    @classmethod
+    def _default_config(cls, accelerator_spec: AcceleratorSpec) -> Dict[str, PassConfigParam]:
+        return {
+            "make_inputs": PassConfigParam(


Do we need to extend it with external data config?

Seems the external related config would be enabled always, right?

Yes, it will always be saved with external data as it is right now so we don't provide the user config options.

trajepl · 2024-04-10T08:20:17Z

olive/passes/onnx/extract_adapters.py

+        # raw_data is required for set_external_data
+        if not new_initializer.HasField("raw_data"):
+            new_initializer.raw_data = b""
+        set_external_data(new_initializer, location="dummy-location.bin")


Will this external data be stored under the path user running olive?
Seems it is dummy stuff? Where will we remove it?

Let us assume the pass flows as

model M -> Finetuning A -> conversion -> extract external lora weights A1

model M -> Finetuning B -> conversion -> extract external lora weights B1
Will the dummy-location.bin be overwrote?

This is actually just a place holder. Since the raw_data field is cleared, during model save, it doesn't save anything at this location. https://github.com/onnx/onnx/blob/main/onnx/external_data_helper.py#L304

This is why the description of this PR says the onnx model produced by extract as initializers is invalid because it points to non-existent file.
This is okay since we only intend to use this model using ort inference session with the missing initializers already added to session options.

devang-ml · 2024-04-10T18:49:53Z

examples/llama2/README.md

+python -m olive.scripts.export_adapters --adapter_path Mikael110/llama-2-7b-guanaco-qlora --dtype float16 --pack_weights --output_path models/guanaco_fp16_packed.npz
+```
+
+Snippet below shows an example runs of the generated fine-tuned model using two different adapters.


As a follow up PR, we want to move this snippet into a working Jypter notebook.

olive/passes/onnx/onnx_dag.py

devang-ml · 2024-04-10T18:59:29Z

olive/scripts/export_adapters.py

@@ -0,0 +1,100 @@
+import argparse


We need to cover this script in Olive docs.

jambayk added 5 commits April 9, 2024 08:04

extract adapters pass added

675edc8

constant inputs, float works

a775d23

quant works too, name updated

Update systems and tests:

a59652a

- isolated env system update - external initializers test

extract adapters script, dag docs and methods

3881d52

doc

llama2 extract adapters example

7ca9564

nit

jambayk commented Apr 9, 2024

View reviewed changes

olive/passes/onnx/conversion.py Show resolved Hide resolved

rename onnx_dag util, add pass ut and docs

84a7874

jambayk force-pushed the jambayk/adapters branch from 22bf0b2 to 84a7874 Compare April 10, 2024 00:28