Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] LDM optimization patches #15824

Merged
merged 3 commits into from
Jun 8, 2024
Merged

Conversation

drhead
Copy link
Contributor

@drhead drhead commented May 17, 2024

Description

Change 1: Timestep Embedding Patch

  • Fixes a blocking op in the timestep embedding. It was creating a tensor on CPU and then moving it to GPU, which would force a sync every step.
  • Combined with the other performance PRs (mine and HCL's), Torch's dispatch queue should be completely unblocked (until extensions with similar problems mess it up). This will allow near constant 100% GPU usage.

Change 2: SpatialTransformer.forward einops removal

  • Changes the function to use native torch reshape/view/permute ops and removes the .contiguous() call.
  • Prevents 32 calls to aten::copy_ and void at::native::elementwise_kernel<128, 4, at::nati... per forward pass (SD 1.5). Speedup seems to be around 6-8 ms per forward, but my profiler is being a little inconsistent with the timing (512x512, batch 4, overclocked 3090)

Checklist:

@drhead
Copy link
Contributor Author

drhead commented May 17, 2024

I think #18620 might need to be merged before tests will pass on this.

@w-e-w
Copy link
Collaborator

w-e-w commented May 17, 2024

so we need to wait 2769 new posts to merge this 🙃

@drhead
Copy link
Contributor Author

drhead commented May 17, 2024

Upon further review I think it would be sufficient for #15820 to be merged first lol

@drhead drhead changed the title Patch timestep embedding to create tensor on-device LDM optimization patches May 17, 2024
@drhead
Copy link
Contributor Author

drhead commented May 17, 2024

Added another patch, and it passes tests now.

@drhead drhead changed the title LDM optimization patches [Performance] LDM optimization patches May 21, 2024
@AUTOMATIC1111 AUTOMATIC1111 merged commit 93b53dc into AUTOMATIC1111:dev Jun 8, 2024
3 checks passed
@lawchingman lawchingman mentioned this pull request Oct 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants