Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨[Feature] Output buffer optimization in runtime module #3275

Open
keehyuna opened this issue Nov 4, 2024 · 0 comments · May be fixed by #3276
Open

✨[Feature] Output buffer optimization in runtime module #3275

keehyuna opened this issue Nov 4, 2024 · 0 comments · May be fixed by #3276
Assignees
Labels
feature request New feature or request

Comments

@keehyuna
Copy link
Collaborator

keehyuna commented Nov 4, 2024

Is your feature request related to a problem? Please describe.

Output buffer optimization in runtime module

Describe the solution you'd like

  • Assuming that the input shape does not change frequently, output buffer is created in previous forward()
  • Latency hiding by creating the tensor for next output buffer
  • Potentially Cuda and CPU(preparing next output buffer) can be overlapped

Describe alternatives you've considered

if runtime module maintains persistent output buffers across multiple inference runs, it allows to reuse previously allocated memory for output tensors, potentially improving performance by reducing memory allocation overhead. But it can not handle live tensors from a previous invocation. Second invocation of model will overwrite output buffer of previous run.

Additional context

@keehyuna keehyuna added the feature request New feature or request label Nov 4, 2024
@keehyuna keehyuna linked a pull request Nov 4, 2024 that will close this issue
7 tasks
@keehyuna keehyuna assigned keehyuna and unassigned narendasan Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants