Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using Ollama as the engine for LLM, restart the llama model every time? #1803

Closed
17Reset opened this issue Mar 28, 2024 · 1 comment
Closed

Comments

@17Reset
Copy link

17Reset commented Mar 28, 2024

If you are using Ollama alone, Ollama will load the model into the GPU, and you don't have to restart loading the model every time you call Ollama's api. But in privategpt, the model has to be reloaded every time a question is asked, which greatly increases the Q&A time.

@dbzoo
Copy link
Contributor

dbzoo commented Apr 6, 2024

I think this solves your problem #1800 the default is 5m. Increase it.

ollama:
  keep_alive: 30m

@17Reset 17Reset closed this as completed Apr 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants