[IPEX] Slice SDPA into smaller chunks #14353
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Slice
scaled_dot_product_attention
into smaller chunks so that the SDPA of each chunk wouldn't request any allocation larger than the given limit.This was initially designed to work around the 4GB single block allocation limitation of Intel compute-runtime (
RuntimeError: Current platform can NOT allocate memory block with size larger than 4GB! Tried to allocate 8.00 GiB
). Then I found out that setting a smaller limit would reduce the VRAM footprint during SDPA calculation. The current limit (VRAM // 8
) was tuned for Intel Arc A770 16G and A750 8G without sacrificing performance.With this change, A770 16G can generate 512x512 of batch size 32 and A750 8G can generate batch size 16
Test results:
Common settings:
--use-ipex --opt-sdp-attention
, txt2img, DPM++ 2M Karras, 20 steps, 512x512 resolution, batch count = 5RuntimeError: Current platform can NOT allocate memory block with size larger than 4GB!
RuntimeError: Allocation is out of device memory on current platform.
A770 16G (connected with two monitors [taking up ~1.1GB VRAM])
A750 8G (not connected with monitors)
Screenshots/videos:
Checklist:
--test-server --use-ipex --opt-sdp-attention