fix(api): combine names for ONNX fp16 optimization

ssube · Mar 27, 2023 · c2f8fb1 · c2f8fb1
1 parent 73e9cf8
commit c2f8fb1
Show file tree

Hide file tree

Showing 3 changed files with 18 additions and 14 deletions.
diff --git a/api/onnx_web/convert/__main__.py b/api/onnx_web/convert/__main__.py
@@ -478,7 +478,7 @@ def main() -> int:
     logger.info("CLI arguments: %s", args)
 
     ctx = ConversionContext.from_environ()
-    ctx.half = args.half or "onnx-internal-fp16" in ctx.optimizations
+    ctx.half = args.half or "onnx-fp16" in ctx.optimizations
     ctx.opset = args.opset
     ctx.token = args.token
     logger.info("converting models in %s using %s", ctx.model_path, ctx.training_device)

diff --git a/docs/server-admin.md b/docs/server-admin.md
@@ -102,19 +102,14 @@ Others:
   - `onnx-deterministic-compute`
     - enable ONNX deterministic compute
   - `onnx-fp16`
-    - force 16-bit floating point values when running pipelines
-    - use with https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/python/tools/transformers/models/stable_diffusion#optimize-onnx-pipeline
-      and the `--float16` flag
+    - convert model nodes to 16-bit floating point values internally while leaving 32-bit inputs
   - `onnx-graph-*`
     - `onnx-graph-disable`
       - disable all ONNX graph optimizations
     - `onnx-graph-basic`
       - enable basic ONNX graph optimizations
     - `onnx-graph-all`
       - enable all ONNX graph optimizations
-  - `onnx-internal-fp16`
-    - convert internal model nodes to 16-bit floating point values
-    - does not reduce disk space as much as `onnx-fp16` or `torch-fp16`, but does not incur as many extra conversions
   - `onnx-low-memory`
     - disable ONNX features that allocate more memory than is strictly required or keep memory after use
 - `torch-*`

diff --git a/docs/user-guide.md b/docs/user-guide.md
@@ -725,20 +725,29 @@ Some common VAE models include:
 ### Optimizing models for lower memory usage
 
 Running Stable Diffusion with ONNX acceleration uses more memory by default than some other methods, but there are a
-number of optimizations that you can apply to reduce the memory usage.
-
-At least 12GB of VRAM is recommended for running all of the models in the extras file, but `onnx-web` should work on
-most 8GB cards and may work on some 6GB cards. 4GB is not supported yet, but [it should be
-possible](https://github.com/ssube/onnx-web/issues/241#issuecomment-1475341043).
+number of [server optimizations](server-admin.md#pipeline-optimizations) that you can apply to reduce the memory usage:
 
 - `diffusers-attention-slicing`
 - `onnx-fp16`
-- `onnx-internal-fp16`
 - `onnx-graph-all`
 - `onnx-low-memory`
 - `torch-fp16`
 
-TODO: memory at different optimization levels
+At least 12GB of VRAM is recommended for running all of the models in the extras file, but `onnx-web` should work on
+most 8GB cards and may work on some 6GB cards. 4GB is not supported yet, but [it should be
+possible](https://github.com/ssube/onnx-web/issues/241#issuecomment-1475341043).
+
+Based on somewhat limited testing, the model size memory usage for each optimization level is approximately:
+
+| Optimizations               | Disk Size | Memory Usage - 1 @ 512x512 | Supported Platforms |
+| --------------------------- | --------- | -------------------------- | ------------------- |
+| none                        | 4.0G      | 11.5G                      | all                 |
+| `onnx-fp16`                 | 2.2G      | 9.9G                       | all                 |
+| ORT script                  | 4.0G      | 6.6G                       | CUDA only           |
+| ORT script with `--float16` | 2.1G      | 5.8G                       | CUDA only           |
+| `torch-fp16`                | 2.0G      | 5.9G                       | CUDA only           |
+
+- https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/python/tools/transformers/models/stable_diffusion#cuda-optimizations-for-stable-diffusion
 
 ### Permanently blending additional networks