Merge branch 'main' into leslie/add_x86_inductor_quantization

pytorch · Sep 19, 2023 · 49e3ae0 · 49e3ae0
2 parents 27ad930 + 29db287
commit 49e3ae0
Showing 1 changed file with 14 additions and 16 deletions.
diff --git a/prototype_source/pt2e_quant_ptq_static.rst b/prototype_source/pt2e_quant_ptq_static.rst
@@ -10,25 +10,23 @@ this flow is expected to have significantly higher model coverage
 (`88% on 14K models <https://github.com/pytorch/pytorch/issues/93667#issuecomment-1601171596>`_),
 better programmability, and a simplified UX.
 
-Exportable by `torch._export.export` is a prerequisite to use the flow, you can
+Exportable by `torch.export.export` is a prerequisite to use the flow, you can
 find what are the constructs that's supported in `Export DB <https://pytorch.org/docs/main/generated/exportdb/index.html>`_.
 
 The high level architecture of quantization 2.0 with quantizer could look like
 this:
 
 ::
 
-    float_model(Python)                               Input
+    float_model(Python)                          Example Input
         \                                              /
          \                                            /
     —-------------------------------------------------------
-    |                        Export                        |
+    |                        export                        |
     —-------------------------------------------------------
                                 |
-                        FX Graph in ATen     XNNPACKQuantizer,
-                                |            or X86InductorQuantizer,
-                                |            or <Other Backend Quantizer>
-                                |                /
+                        FX Graph in ATen     Backend Specific Quantizer
+                                |                       /
     —--------------------------------------------------------
     |                     prepare_pt2e                      |
     —--------------------------------------------------------
@@ -39,13 +37,13 @@ this:
     |                    convert_pt2e                       |
     —--------------------------------------------------------
                                 |
-                    Reference Quantized Model
+                        Quantized Model
                                 |
     —--------------------------------------------------------
     |                       Lowering                        |
     —--------------------------------------------------------
                                 |
-            Executorch, or Inductor, or <Other Backends>
+            Executorch, Inductor or <Other Backends>
 
 
 The PyTorch 2.0 export quantization API looks like this:
@@ -377,15 +375,15 @@ The following code snippets describes how to quantize the model:
     get_symmetric_quantization_config,
   )
   quantizer = XNNPACKQuantizer()
-  quantizer.set_globa(get_symmetric_quantization_config())
+  quantizer.set_global(get_symmetric_quantization_config())
 
 ``Quantizer`` is backend specific, and each ``Quantizer`` will provide their
 own way to allow users to configure their model. Just as an example, here is
 the different configuration APIs supported by ``XNNPackQuantizer``:
 
 .. code-block:: python
 
-  quantizer.set_global(qconfig_opt)  # qconfig_opt is an optional qconfig, either a valid qconfig or None
+  quantizer.set_global(qconfig_opt)  # qconfig_opt is an optional quantization config
       .set_object_type(torch.nn.Conv2d, qconfig_opt) # can be a module type
       .set_object_type(torch.nn.functional.linear, qconfig_opt) # or torch functional op
       .set_module_name("foo.bar", qconfig_opt)
@@ -441,8 +439,7 @@ we offer in the long term might change based on feedback from PyTorch users.
 
 * Q/DQ Representation (default)
 
-  Previous documentation for `representations <https://github.com/pytorch/rfcs/blob/master/RFC-0019- 
- Extending-PyTorch-Quantization-to-Custom-Backends.md>`_ all quantized operators are represented as ``dequantize -> fp32_op -> qauntize``.
+  Previous documentation for `representations <https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md>`_ all quantized operators are represented as ``dequantize -> fp32_op -> qauntize``.
 
 .. code-block:: python
 
@@ -457,9 +454,10 @@ we offer in the long term might change based on feedback from PyTorch users.
        out_fp32, out_scale, out_zero_point, out_quant_min, out_quant_max, torch.int8)
        return out_i8
      
-* Reference Quantized Model Representation (WIP, expected to be ready at end of August): we have special representation for selected ops (for example, quantized linear), other ops are represented as (``dq -> float32_op -> q``), and ``q/dq`` are decomposed into more primitive operators.
+* Reference Quantized Model Representation (available in the nightly build)
 
-You can get this representation by using ``convert_pt2e(..., use_reference_representation=True)``.
+  We will have a special representation for selected ops, for example, quantized linear. Other ops are represented as ``dq -> float32_op -> q`` and ``q/dq`` are decomposed into more primitive operators.
+  You can get this representation by using ``convert_pt2e(..., use_reference_representation=True)``.
 
 .. code-block:: python
    
@@ -515,7 +513,7 @@ Now we can compare the size and model accuracy with baseline model.
 If you want to get better accuracy or performance,  try configuring
 ``quantizer`` in different ways, and each ``quantizer`` will have its own way
 of configuration, so please consult the documentation for the
-quantization you are using to learn more about how you can have more control
+quantizer you are using to learn more about how you can have more control
 over how to quantize a model.
 
 Save and Load Quantized Model