You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is it possible to stop the model? We have start(), but we don’t have a stop() equivalent.
In certain scenarios, it would be useful to have the ability to gracefully stop or terminate a running model inference process, especially when it’s being used in environments where resource management is crucial.
A stop() function could help with:
• Freeing up resources like memory or compute when the model is no longer needed.
• Handling cases where the inference is taking too long and needs to be interrupted.
• Ensuring that models can be started and stopped dynamically without having to reinitialise and reinitialise the whole model object.
Is this something that could be added, or is there already a workaround for this use case?
Thanks!
The text was updated successfully, but these errors were encountered:
The SwiftLlama is very lightweight, you can free the model related resources by freeing up SwiftLlama object instance. If stop() frees up the memory that models used, the system has to reload models before calling it again and reinitialise would be a must.
I understand you might be thinking freeing up resources partially, I didn't dig into the llama.cpp code for this yet. I am also not sure if partially freeing up memory is meaningful or not as this type of system is usually designed to run with exclusive resources.
Regarding stopping long run cases, the maxTokenCount parameter in configuration is used for this purpose.
Is it possible to stop the model? We have start(), but we don’t have a stop() equivalent.
In certain scenarios, it would be useful to have the ability to gracefully stop or terminate a running model inference process, especially when it’s being used in environments where resource management is crucial.
A stop() function could help with:
• Freeing up resources like memory or compute when the model is no longer needed.
• Handling cases where the inference is taking too long and needs to be interrupted.
• Ensuring that models can be started and stopped dynamically without having to reinitialise and reinitialise the whole model object.
Is this something that could be added, or is there already a workaround for this use case?
Thanks!
The text was updated successfully, but these errors were encountered: