Skip to content

Commit

Permalink
Merge branch 'main' into leslie/add_x86_inductor_quantization
Browse files Browse the repository at this point in the history
  • Loading branch information
Svetlana Karslioglu authored Sep 19, 2023
2 parents 27ad930 + 29db287 commit 49e3ae0
Showing 1 changed file with 14 additions and 16 deletions.
30 changes: 14 additions & 16 deletions prototype_source/pt2e_quant_ptq_static.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,25 +10,23 @@ this flow is expected to have significantly higher model coverage
(`88% on 14K models <https://github.com/pytorch/pytorch/issues/93667#issuecomment-1601171596>`_),
better programmability, and a simplified UX.

Exportable by `torch._export.export` is a prerequisite to use the flow, you can
Exportable by `torch.export.export` is a prerequisite to use the flow, you can
find what are the constructs that's supported in `Export DB <https://pytorch.org/docs/main/generated/exportdb/index.html>`_.

The high level architecture of quantization 2.0 with quantizer could look like
this:

::

float_model(Python) Input
float_model(Python) Example Input
\ /
\ /
—-------------------------------------------------------
| Export |
| export |
—-------------------------------------------------------
|
FX Graph in ATen XNNPACKQuantizer,
| or X86InductorQuantizer,
| or <Other Backend Quantizer>
| /
FX Graph in ATen Backend Specific Quantizer
| /
—--------------------------------------------------------
| prepare_pt2e |
—--------------------------------------------------------
Expand All @@ -39,13 +37,13 @@ this:
| convert_pt2e |
—--------------------------------------------------------
|
Reference Quantized Model
Quantized Model
|
—--------------------------------------------------------
| Lowering |
—--------------------------------------------------------
|
Executorch, or Inductor, or <Other Backends>
Executorch, Inductor or <Other Backends>


The PyTorch 2.0 export quantization API looks like this:
Expand Down Expand Up @@ -377,15 +375,15 @@ The following code snippets describes how to quantize the model:
get_symmetric_quantization_config,
)
quantizer = XNNPACKQuantizer()
quantizer.set_globa(get_symmetric_quantization_config())
quantizer.set_global(get_symmetric_quantization_config())
``Quantizer`` is backend specific, and each ``Quantizer`` will provide their
own way to allow users to configure their model. Just as an example, here is
the different configuration APIs supported by ``XNNPackQuantizer``:

.. code-block:: python
quantizer.set_global(qconfig_opt) # qconfig_opt is an optional qconfig, either a valid qconfig or None
quantizer.set_global(qconfig_opt) # qconfig_opt is an optional quantization config
.set_object_type(torch.nn.Conv2d, qconfig_opt) # can be a module type
.set_object_type(torch.nn.functional.linear, qconfig_opt) # or torch functional op
.set_module_name("foo.bar", qconfig_opt)
Expand Down Expand Up @@ -441,8 +439,7 @@ we offer in the long term might change based on feedback from PyTorch users.

* Q/DQ Representation (default)

Previous documentation for `representations <https://github.com/pytorch/rfcs/blob/master/RFC-0019-
Extending-PyTorch-Quantization-to-Custom-Backends.md>`_ all quantized operators are represented as ``dequantize -> fp32_op -> qauntize``.
Previous documentation for `representations <https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md>`_ all quantized operators are represented as ``dequantize -> fp32_op -> qauntize``.

.. code-block:: python
Expand All @@ -457,9 +454,10 @@ we offer in the long term might change based on feedback from PyTorch users.
out_fp32, out_scale, out_zero_point, out_quant_min, out_quant_max, torch.int8)
return out_i8
* Reference Quantized Model Representation (WIP, expected to be ready at end of August): we have special representation for selected ops (for example, quantized linear), other ops are represented as (``dq -> float32_op -> q``), and ``q/dq`` are decomposed into more primitive operators.
* Reference Quantized Model Representation (available in the nightly build)

You can get this representation by using ``convert_pt2e(..., use_reference_representation=True)``.
We will have a special representation for selected ops, for example, quantized linear. Other ops are represented as ``dq -> float32_op -> q`` and ``q/dq`` are decomposed into more primitive operators.
You can get this representation by using ``convert_pt2e(..., use_reference_representation=True)``.

.. code-block:: python
Expand Down Expand Up @@ -515,7 +513,7 @@ Now we can compare the size and model accuracy with baseline model.
If you want to get better accuracy or performance, try configuring
``quantizer`` in different ways, and each ``quantizer`` will have its own way
of configuration, so please consult the documentation for the
quantization you are using to learn more about how you can have more control
quantizer you are using to learn more about how you can have more control
over how to quantize a model.

Save and Load Quantized Model
Expand Down

0 comments on commit 49e3ae0

Please sign in to comment.