Quickstart

Steps

Deploy Sample vLLM Application

A sample vLLM deployment with the proper protocol to work with LLM Instance Gateway can be found here.
Update Envoy Gateway Config to enable Patch Policy

Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via EnvoyPatchPolicy. To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run:
```
kubectl apply -f ./manifests/enable_patch_policy.yaml
kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system
```
Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again.

Deploy Gateway

kubectl apply -f ./manifests/gateway.yaml

Deploy Ext-Proc
```
kubectl apply -f ./manifests/ext_proc.yaml
kubectl apply -f ./manifests/patch_policy.yaml
```
NOTE: Ensure the instance-gateway-ext-proc deployment is updated with the pod names and internal IP addresses of the vLLM replicas. This step is crucial for the correct routing of requests based on headers. This won't be needed once we make ext proc dynamically read the pods.

Try it out

Wait until the gateway is ready.

IP=$(kubectl get gateway/llm-gateway -o jsonpath='{.status.addresses[0].value}')
PORT=8081

curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
"model": "tweet-summary",
"prompt": "Write as if you were a critic: San Francisco",
"max_tokens": 100,
"temperature": 0
}'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Quickstart

Steps

Files

README.md

Latest commit

History

README.md

File metadata and controls

Quickstart

Steps