Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of tt.load and tt.store for FP8 when converting block ptr to regular ptrs #2374

Open
etiotto opened this issue Sep 27, 2024 · 1 comment · May be fixed by #2502
Open

Comments

@etiotto
Copy link
Contributor

etiotto commented Sep 27, 2024

We would like to remove the RewriteTensorPointer pass which rewrites block pointers into regular pointers (except when it determines load/store operations on block ptrs can be converted to 2D block reads/writes). The idea is to avoid loosing semantic information too early and instead deal with block ptr that cannot be used to generate 2D block reads/stores while lowering that operation).

For this scheme to work, we first need to improve the lowering code for tt.load and tt.store operations that use a block ptr with an element type that is not (currently) supported by the 2D read instructions available on the target GPU (e.g. the element is FP8).

See #2359 (comment) for more context.

@etiotto
Copy link
Contributor Author

etiotto commented Oct 9, 2024

The first step is to improve axis analysis and add support for blocked pointers to it (#2451).

@etiotto etiotto linked a pull request Oct 16, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants