Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Unity] Support TIR kernel for PagedKVCache #16374

Merged

Conversation

MasterJH5574
Copy link
Contributor

This PR supports PagedKVCache with leveraging TIR kernels.

Right now we do not have sufficient TIR kernels for multi-level sequences in PagedKVCache, therefore Fork in PagedKVCache is disabled when such a function does not exist.

This PR adds a "reduced" creator of PagedKVCache, where some auxiliary functions such as the begin/end forward function of prefill/decode default to None.

CUDA tests are added to ensure correctness.

Co-authored-by: Hongyi Jin [email protected]
Co-authored-by: Bohan Hou [email protected]

@MasterJH5574 MasterJH5574 force-pushed the unity-dev/2024-01-09-paged-kv-cache-tir branch 3 times, most recently from 1603a90 to efbb7dd Compare January 11, 2024 22:10
This PR supports PagedKVCache with leveraging TIR kernels.

Right now we do not have sufficient TIR kernels for multi-level
sequences in PagedKVCache, therefore `Fork` in PagedKVCache
is disabled when such a function does not exist.

This PR adds a "reduced" creator of PagedKVCache, where
some auxiliary functions such as the begin/end forward function
of prefill/decode default to None.

CUDA tests are added to ensure correctness.

Co-authored-by: Hongyi Jin <[email protected]>
Co-authored-by: Bohan Hou <[email protected]>
@MasterJH5574 MasterJH5574 force-pushed the unity-dev/2024-01-09-paged-kv-cache-tir branch from efbb7dd to b96d082 Compare January 12, 2024 00:05
@tqchen tqchen merged commit 7798e93 into apache:unity Jan 12, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants