Stopping the Model #15

will-lumley · 2024-10-11T12:46:25Z

Is it possible to stop the model? We have start(), but we don’t have a stop() equivalent.

In certain scenarios, it would be useful to have the ability to gracefully stop or terminate a running model inference process, especially when it’s being used in environments where resource management is crucial.

A stop() function could help with:
• Freeing up resources like memory or compute when the model is no longer needed.
• Handling cases where the inference is taking too long and needs to be interrupted.
• Ensuring that models can be started and stopped dynamically without having to reinitialise and reinitialise the whole model object.

Is this something that could be added, or is there already a workaround for this use case?

Thanks!

ShenghaiWang · 2024-10-11T21:23:34Z

The SwiftLlama is very lightweight, you can free the model related resources by freeing up SwiftLlama object instance. If stop() frees up the memory that models used, the system has to reload models before calling it again and reinitialise would be a must.

I understand you might be thinking freeing up resources partially, I didn't dig into the llama.cpp code for this yet. I am also not sure if partially freeing up memory is meaningful or not as this type of system is usually designed to run with exclusive resources.

Regarding stopping long run cases, the maxTokenCount parameter in configuration is used for this purpose.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stopping the Model #15

Stopping the Model #15

will-lumley commented Oct 11, 2024

ShenghaiWang commented Oct 11, 2024

Stopping the Model #15

Stopping the Model #15

Comments

will-lumley commented Oct 11, 2024

ShenghaiWang commented Oct 11, 2024