Reuse Knative Eventing containers in order to keep GPU/CPU/Memory state #8310

milo157 · 2024-11-07T06:02:40Z

Problem
For long running tasks, you can't reuse containers. It seems to be because knative eventing creates jobs not pods. It would be great to reuse event containers (whether jobs or pods) instead of creating new jobs for a task.

The use case is for jobs that have a high initialisation time eg: Loading LLM's to process data that take minutes to load into GPU Memory and that take a long time to process

Persona:
Which persona is this feature for?
Event consumer

Exit Criteria
A measurable (binary) test that would indicate that the problem has been resolved.

Time Estimate (optional):
How many developer-days do you think this may take to resolve?
Unclear

Additional context (optional)
Add any other context about the feature request here.

skonto · 2024-11-07T16:53:41Z

Hi @milo157 could you elaborate on your use case, for example what do you use Eventing for eg. feed events for inference? Could you describe the architecture a bit? Your request is to basically process more than one event per job/pod is that right (now each event creates one job) and re-use existing jobs/pods meant for the same group of events?

milo157 · 2024-11-08T09:50:49Z

It is for long running tasks data processing/inferencing tasks.

We have an application that takes 2-3 minutes to load various ML models into GPU memory. Once loaded, we would send a event to be processed. Could be a few seconds, a few minutes or a few hours but we would like to know the status of the task at various points and get logs.

Once an event finishes processing, we would like to reuse that container since it has already spend the 2-3 minutes loading the models, so essentially we want to bypass that for efficiency/cost reasons. Currently once a event finishes and we send a new event, it would create a new job on a new container and we would have to wait 2-3 minutes for the models to load.

Of course, if we send a task and existing containers/jobs are busy then it should start a new one.

To answer your question, yes, re-using existing jobs/pods would be for the same group of events

milo157 added the kind/feature-request label Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse Knative Eventing containers in order to keep GPU/CPU/Memory state #8310

Reuse Knative Eventing containers in order to keep GPU/CPU/Memory state #8310

milo157 commented Nov 7, 2024

skonto commented Nov 7, 2024 •

edited

Loading

milo157 commented Nov 8, 2024

Reuse Knative Eventing containers in order to keep GPU/CPU/Memory state #8310

Reuse Knative Eventing containers in order to keep GPU/CPU/Memory state #8310

Comments

milo157 commented Nov 7, 2024

skonto commented Nov 7, 2024 • edited Loading

milo157 commented Nov 8, 2024

skonto commented Nov 7, 2024 •

edited

Loading