add ollama caching #297

kaiehrhardt · 2024-10-29T08:44:07Z

https://github.com/substratusai/kubeai/blob/main/api/v1/model_types.go#L24

nstogner · 2024-11-05T13:44:26Z

Hey Kai, as we discussed over chat, vLLM is typically the go-to for serving concurrent production traffic. Does that work for you, or is ollama caching still important for you?

kaiehrhardt · 2024-11-05T14:22:27Z

Hey Nick,

Ollama is just easier because you don't need an account with a token and have to join the corresponding models. From my point of view, this makes it easier to get started. So i still think it would make sense to support caching for ollama as well.

nstogner · 2024-11-05T15:38:56Z

That makes sense. We currently have 2 high priority features that we are focusing on: #132 and #266 ... We can probably fit this feature in after those.

kaiehrhardt · 2024-11-05T15:47:19Z

Sounds good. Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add ollama caching #297

add ollama caching #297

kaiehrhardt commented Oct 29, 2024

nstogner commented Nov 5, 2024

kaiehrhardt commented Nov 5, 2024

nstogner commented Nov 5, 2024

kaiehrhardt commented Nov 5, 2024

add ollama caching #297

add ollama caching #297

Comments

kaiehrhardt commented Oct 29, 2024

nstogner commented Nov 5, 2024

kaiehrhardt commented Nov 5, 2024

nstogner commented Nov 5, 2024

kaiehrhardt commented Nov 5, 2024