You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem
For long running tasks, you can't reuse containers. It seems to be because knative eventing creates jobs not pods. It would be great to reuse event containers (whether jobs or pods) instead of creating new jobs for a task.
The use case is for jobs that have a high initialisation time eg: Loading LLM's to process data that take minutes to load into GPU Memory and that take a long time to process
Persona:
Which persona is this feature for?
Event consumer
Exit Criteria
A measurable (binary) test that would indicate that the problem has been resolved.
Time Estimate (optional):
How many developer-days do you think this may take to resolve?
Unclear
Additional context (optional)
Add any other context about the feature request here.
The text was updated successfully, but these errors were encountered:
Hi @milo157 could you elaborate on your use case, for example what do you use Eventing for eg. feed events for inference? Could you describe the architecture a bit? Your request is to basically process more than one event per job/pod is that right (now each event creates one job) and re-use existing jobs/pods meant for the same group of events?
It is for long running tasks data processing/inferencing tasks.
We have an application that takes 2-3 minutes to load various ML models into GPU memory. Once loaded, we would send a event to be processed. Could be a few seconds, a few minutes or a few hours but we would like to know the status of the task at various points and get logs.
Once an event finishes processing, we would like to reuse that container since it has already spend the 2-3 minutes loading the models, so essentially we want to bypass that for efficiency/cost reasons. Currently once a event finishes and we send a new event, it would create a new job on a new container and we would have to wait 2-3 minutes for the models to load.
Of course, if we send a task and existing containers/jobs are busy then it should start a new one.
To answer your question, yes, re-using existing jobs/pods would be for the same group of events
Problem
For long running tasks, you can't reuse containers. It seems to be because knative eventing creates jobs not pods. It would be great to reuse event containers (whether jobs or pods) instead of creating new jobs for a task.
The use case is for jobs that have a high initialisation time eg: Loading LLM's to process data that take minutes to load into GPU Memory and that take a long time to process
Persona:
Which persona is this feature for?
Event consumer
Exit Criteria
A measurable (binary) test that would indicate that the problem has been resolved.
Time Estimate (optional):
How many developer-days do you think this may take to resolve?
Unclear
Additional context (optional)
Add any other context about the feature request here.
The text was updated successfully, but these errors were encountered: