Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for attention slicing and other CUDA optimizations #155

Closed
ssube opened this issue Feb 16, 2023 · 2 comments · Fixed by #177
Closed

add support for attention slicing and other CUDA optimizations #155

ssube opened this issue Feb 16, 2023 · 2 comments · Fixed by #177
Assignees
Labels
status/fixed issues that have been fixed and released type/feature new features
Milestone

Comments

@ssube
Copy link
Owner

ssube commented Feb 16, 2023

microsoft/onnxruntime#11118

@ssube ssube added status/new issues that have not been confirmed yet type/feature new features labels Feb 16, 2023
@ssube ssube added this to the v0.8 milestone Feb 16, 2023
@ssube ssube added status/progress issues that are in progress and have a branch and removed status/new issues that have not been confirmed yet labels Feb 18, 2023
@ssube
Copy link
Owner Author

ssube commented Feb 18, 2023

I've written the code to enable the optimizations, but only when they have been set in the env. There should be some code to calculate the correct optimizations for the current platform, but it looks like some of them do not apply to ONNX:

[2023-02-18 18:01:02,270] DEBUG: onnx_web.diffusion.load: enabling attention slicing on SD pipeline                                                                                 
[2023-02-18 18:01:02,270] DEBUG: onnx_web.diffusion.load: enabling VAE slicing on SD pipeline
[2023-02-18 18:01:02,270] WARNING: onnx_web.diffusion.load: error while enabling VAE slicing: 'OnnxStableDiffusionPipeline' object has no attribute 'enable_vae_slicing'            
[2023-02-18 18:01:02,270] DEBUG: onnx_web.diffusion.load: enabling model CPU offload on SD pipeline
[2023-02-18 18:01:02,270] WARNING: onnx_web.diffusion.load: error while enabling model CPU offload: 'OnnxStableDiffusionPipeline' object has no attribute 'enable_model_cpu_offload'
[2023-02-18 18:01:02,270] DEBUG: onnx_web.server.model_cache: cache limit set to 0, not caching model: diffusion
[2023-02-18 18:01:02,270] DEBUG: onnx_web.server.model_cache: cache limit set to 0, not caching model: scheduler 

@ssube ssube self-assigned this Feb 18, 2023
@ssube
Copy link
Owner Author

ssube commented Feb 18, 2023

The CUDA and ONNX optimizations are all available behind the ONNX_WEB_OPTIMIZATIONS variable, but need to be manually enabled until I can figure out which ones are available/appropriate for each platform.

@ssube ssube added status/fixed issues that have been fixed and released and removed status/progress issues that are in progress and have a branch labels Feb 18, 2023
@ssube ssube mentioned this issue Mar 5, 2023
99 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/fixed issues that have been fixed and released type/feature new features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant