Skip to content

Commit

Permalink
fix(api): combine names for ONNX fp16 optimization
Browse files Browse the repository at this point in the history
  • Loading branch information
ssube committed Mar 27, 2023
1 parent 73e9cf8 commit c2f8fb1
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 14 deletions.
2 changes: 1 addition & 1 deletion api/onnx_web/convert/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -478,7 +478,7 @@ def main() -> int:
logger.info("CLI arguments: %s", args)

ctx = ConversionContext.from_environ()
ctx.half = args.half or "onnx-internal-fp16" in ctx.optimizations
ctx.half = args.half or "onnx-fp16" in ctx.optimizations
ctx.opset = args.opset
ctx.token = args.token
logger.info("converting models in %s using %s", ctx.model_path, ctx.training_device)
Expand Down
7 changes: 1 addition & 6 deletions docs/server-admin.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,19 +102,14 @@ Others:
- `onnx-deterministic-compute`
- enable ONNX deterministic compute
- `onnx-fp16`
- force 16-bit floating point values when running pipelines
- use with https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/python/tools/transformers/models/stable_diffusion#optimize-onnx-pipeline
and the `--float16` flag
- convert model nodes to 16-bit floating point values internally while leaving 32-bit inputs
- `onnx-graph-*`
- `onnx-graph-disable`
- disable all ONNX graph optimizations
- `onnx-graph-basic`
- enable basic ONNX graph optimizations
- `onnx-graph-all`
- enable all ONNX graph optimizations
- `onnx-internal-fp16`
- convert internal model nodes to 16-bit floating point values
- does not reduce disk space as much as `onnx-fp16` or `torch-fp16`, but does not incur as many extra conversions
- `onnx-low-memory`
- disable ONNX features that allocate more memory than is strictly required or keep memory after use
- `torch-*`
Expand Down
23 changes: 16 additions & 7 deletions docs/user-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -725,20 +725,29 @@ Some common VAE models include:
### Optimizing models for lower memory usage

Running Stable Diffusion with ONNX acceleration uses more memory by default than some other methods, but there are a
number of optimizations that you can apply to reduce the memory usage.

At least 12GB of VRAM is recommended for running all of the models in the extras file, but `onnx-web` should work on
most 8GB cards and may work on some 6GB cards. 4GB is not supported yet, but [it should be
possible](https://github.com/ssube/onnx-web/issues/241#issuecomment-1475341043).
number of [server optimizations](server-admin.md#pipeline-optimizations) that you can apply to reduce the memory usage:

- `diffusers-attention-slicing`
- `onnx-fp16`
- `onnx-internal-fp16`
- `onnx-graph-all`
- `onnx-low-memory`
- `torch-fp16`

TODO: memory at different optimization levels
At least 12GB of VRAM is recommended for running all of the models in the extras file, but `onnx-web` should work on
most 8GB cards and may work on some 6GB cards. 4GB is not supported yet, but [it should be
possible](https://github.com/ssube/onnx-web/issues/241#issuecomment-1475341043).

Based on somewhat limited testing, the model size memory usage for each optimization level is approximately:

| Optimizations | Disk Size | Memory Usage - 1 @ 512x512 | Supported Platforms |
| --------------------------- | --------- | -------------------------- | ------------------- |
| none | 4.0G | 11.5G | all |
| `onnx-fp16` | 2.2G | 9.9G | all |
| ORT script | 4.0G | 6.6G | CUDA only |
| ORT script with `--float16` | 2.1G | 5.8G | CUDA only |
| `torch-fp16` | 2.0G | 5.9G | CUDA only |

- https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/python/tools/transformers/models/stable_diffusion#cuda-optimizations-for-stable-diffusion

### Permanently blending additional networks

Expand Down

0 comments on commit c2f8fb1

Please sign in to comment.