Releases: pulp-platform/nemo
v0.0.8
The main feature included in this version is the addition of Statistics-Aware Weight Binning (https://arxiv.org/pdf/1807.06964.pdf) as a technique for weight training.
v0.0.7
This release includes minor fixes to release v0.0.6, in order to be more conservative and guarantee a 32-bit accumulator is always sufficient.
v0.0.6
This release fixes several problems related to BatchNormalization, targeting numerical inconsistencies between QD and ID stage and substantially improving the stability in FQ training. In particular:
- BN parameters are no longer as big as possible by default, they are now calibrated according to the BN output range; BN additive parameter is no longer requantized.
- BatchNormalization in QD/ID stages can be calibrated using statistics collected from validation (or with a reasonable default)
- BatchNormalization freezing also disables gradients, unless explicitly requested not to do so.
v0.0.5
v0.0.5 introduces new equalization strategies, and switches from floor-based to round-based quantization for weights. Notice this breaks compatibility with networks that were fine-tuned with quantization in previous versions. Compatibility is restored by subtracting eps/2 from all weights.
v0.0.4
This releases focuses on numerical equivalence between the FQ, QD, ID stages.
- FQ and QD stages: fix weight hardening so that it correctly represent the fact that clipping parameters (alpha in particular) will have to be also quantized
- QD stage: fix the way that quantized activations are performed, now they are exactly like integer activations, except that the numerical format is not converted to int64, as this conversion costs a very significant perf hit
- FQ and QD stages: add support for input biasing, bias removal
- FQ stage: introduce alpha, beta quantization both in inference and training. For beta, it is mostly cosmetic;
for alpha, instead, it is a substantial difference, because this parameter will have to be represented as an integer when deploying to QD, ID. - QD stage: rune useless BN parameters after Linear weight hardening. Batch-Normalization parameter quantization is easier when the parameters do not have outliers (as quantization is performed per tensor). These outliers are common when the previous Linear layer's weights are small along the same channel - a condition that often results in them being always 0 after hardening. To ease BN quantization, we introduce a transformation that prunes BN parameters when they follow a channel with 0'ed weights (i.e., it is effectively unused). This is active by default when switching to QD stage.
- various other minors
v0.0.3
- Automatically derive DFQ equalization dict -- moreover, consider also activation alpha in DFQ equalization. This is useful if the alpha is an "algorithmic" clipping derived, e.g., from a ReLU6 -- as opposed to a clipping we impose only post-calibration such as in the case of a ReLU.
- Consider also ConstantPad2d in bias removal
- Add convenience net.qd_stage and net.id_stage funcs: these represent a more user-friendly way to switch between the FQ -> QD -> ID internal representations of the network.
- Support BatchNorm1d in QD,ID (via PACT_*BatchNorm2d): did not (yet) rename PACT_*BatchNorm2d to reflect the fact that they are actually PACT_*BatchNormNd's now!
- Fix shape of reshape_before/after for PACT_Linear layers
- Add dropout_to_identity transform and related flag in quantize_pact: this is useful when deploying networks including Dropout layers, which are not useful in inference and cause issues in the DeployGraph. Note that the Dropout removal has to be done in FullPrecision stage, i.e., before creating the deploy graph: this commits adds a flag to quantize_pact in order to do this (should probably not be active when doing retraining).
- Bump version to 0.0.3
- Fix mnist_test to make it easier to debug
- Work around a regression in mnist_test for equalize_dfq
- Other minor fixes
v0.0.2
v0.0.1
First release of NEMO. Includes:
- support for PyTorch 1.3 and 1.4
- per-layer quantization using PACT-like strategy
- support for fake-quantized fine-tuning
- support for deployment to integer version
- example of post-training quantization