Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add XeTLA FA backward implementation to benchmark #2367

Merged
merged 4 commits into from
Oct 14, 2024
Merged

Conversation

ESI-SYD
Copy link
Contributor

@ESI-SYD ESI-SYD commented Sep 27, 2024

Fix
@ESI-SYD ESI-SYD marked this pull request as draft September 27, 2024 02:23
@ESI-SYD ESI-SYD marked this pull request as ready for review September 29, 2024 01:08
@etiotto etiotto requested a review from ZzEeKkAa October 7, 2024 14:40
Copy link
Contributor

@whitneywhtsang whitneywhtsang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we run it on CI and check if the result can be compared with torch?

@ESI-SYD
Copy link
Contributor Author

ESI-SYD commented Oct 8, 2024

Can we run it on CI and check if the result can be compared with torch?

CI tutorial seems not ready for backward (include torch's implication) yet.

set(XETLA_KERNEL_FLAGS ${XETLA_KERNEL_FLAGS} -fsycl)
set(XETLA_KERNEL_FLAGS ${XETLA_KERNEL_FLAGS}
-fsycl
-fsycl-device-code-split=per_kernel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it change?

Copy link
Contributor Author

@ESI-SYD ESI-SYD Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change adds additional flag to perform sycl kernel splitting and helps to resolve RuntimeError below.

No perf regression for xetla in my local env.

RuntimeError: The program was built for 1 devices
Build program log for 'Intel(R) Data Center GPU Max 1100':
 -11 (PI_ERROR_BUILD_PROGRAM_FAILURE)

@ESI-SYD ESI-SYD merged commit fe45283 into main Oct 14, 2024
5 checks passed
@ESI-SYD ESI-SYD deleted the yudong/fa_bwd_xetla branch October 14, 2024 01:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding XeTLA FA implementation for all variants mentioned in the FA paper
5 participants