The LLM Instance Gateway is a part of wg-serving, and this repo contains: the load balancing algorithm, ext-proc code, CRDs, and controllers to support the LLM Instance Gateway.
This Gateway is intented to provide value to multiplexed LLM services on a shared pool of compute. See the proposal for more info.
This project is currently in development.
For more rapid testing, our PoC is in the ./examples/
dir.
Install the CRDs into the cluster:
make install
Delete the APIs(CRDs) from the cluster:
make uninstall
Deploying the ext-proc image Refer to this README on how to deploy the Ext-Proc image used to support Instance Gateway.
Our community meeting is weekly at Th 10AM PDT; zoom link here.
We currently utilize the #wg-serving slack channel for communications.
Contributions are readily welcomed, thanks for joining us!
Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.