Quantization-aware training framework for DIANA
Clone this repo:
git clone https://github.com/dianaKUL/diana-training-fmw.git
cd diana-training-fmw
git submodule init
git submodule update
With python virtual env and virtualenvwrapper:
mkvirtualenv quantlib -p /usr/bin/python3.10
workon quantlib
pip install -e .
See the examples
The following steps are executed to quantize a pytorch model:
from dianaquantlib.utils.BaseModules import DianaModule
model = MyModel()
def representative_dataset():
for _, data in dataloader:
yield data
# create a fake quantized model from a regular pytorch model
fq_model = DianaModule(
DianaModule.from_trainedfp_model(model),
representative_dataset,
)
# enable weights and activation quantization (fake quantized)
fq_model.set_quantized(activations=True)
# rewrite operations to match the hardware (still fake quantized)
fq_model.map_to_hw()
# convert all decimal values to integer values (true quantized)
fq_model.integrize_layers()
# export to onnx
fq_model.export_model('export')
-
DianaModule
prepares the model for quantization:- Apply canonicalization: replace functionals such as
F.relu
withnn.ReLU
- Wraps modules like
nn.Conv2d
andnn.ReLU
inDIANAConv2d
andDIANAReLU
modules, respectively
- Apply canonicalization: replace functionals such as
-
fq_model.set_quantized
enables the quantization wrappers:- The fake quantized model is forwarded with samples from the
representative_dataset
function. During forward, layer-wise statistics are recorded - Based on these statistics, quantization scales are estimated for each layer
- Quantization is enabled
- After this step, fine-tuning can be done if desired, since quantization can introduce loss in accuracy
- The fake quantized model is forwarded with samples from the
-
fq_model.map_to_hw
:- Batchnorm layers are folded into their preceding convolution layers, in case they are mapped to the digital core
- Quantization scales are re-calculated with the same procedure as the previous step
DIANAReLU
andDIANAIdentity
modules are replaced with re-quantizer modules. This is part of the integrize step, not sure why this is done in this step- After this step, additional fine-tuning can be done if desired, since folding batchnorm layers plus re-calculating the scales can introduce loss in accuracy
-
fq_model.integrize_layers
:- Rescale weights and biases to real integers
-
fq_model.export_model
:- Export quantized model to ONNX file (.onnx file)
- Dump test input/intermediate/output tensor maps (.npy files)
Tested models:
- ResNet8
- ResNet20
- MobileNetV1
- MobileNetV2
- DeepAutoEncoder (DAE)
- DSCNN
nn.AdaptiveAvgPool2d
nn.BatchNorm2d
nn.Conv2d
nn.Dropout
nn.Dropout2d
nn.Flatten
nn.Linear
nn.ReLU
NOTE: Although all convolution hyper parameters are supported, the accelerator cores on DIANA only support a limited set. See HTVM's documentation for more info.
add
flatten
F.relu
reshape
view
- symmetric per-axis int8 weight quantization with power-of-two scale
- symmetric per-axis int8 activation quantization with power-of-two scale
TODO
WARNING: Support for the analog core is unfinished
Currently, dianaquantlib assumes a quantized op is ran on either the digital or analog core. It therefore either uses the quantization scheme of the digital or analog core. In the future, more flexible quantization schemes could be used when ran on CPU.