Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated BentoML installation documentation and example #2414

Merged
merged 14 commits into from
Apr 7, 2023
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ This repo periodically syncs all official Kubeflow components from their respect
| KServe Models Web App | contrib/kserve/models-web-app | [v0.10.0](https://github.com/kserve/models-web-app/tree/v0.10.0/config) |
| Kubeflow Pipelines | apps/pipeline/upstream | [2.0.0-alpha.7](https://github.com/kubeflow/pipelines/tree/2.0.0-alpha.7/manifests/kustomize) |
| Kubeflow Tekton Pipelines | apps/kfp-tekton/upstream | [v1.5.1](https://github.com/kubeflow/kfp-tekton/tree/v1.5.1/manifests/kustomize) |
| BentoML | contrib/bentoml/bentoml-yatai-stack/default | [v1.7.0](https://github.com/ssheng/manifests/tree/master/contrib/bentoml/bentoml-yatai-stack/default) |
ssheng marked this conversation as resolved.
Show resolved Hide resolved

The following is also a matrix with versions from common components that are
used from the different projects of Kubeflow:
Expand Down Expand Up @@ -293,6 +294,17 @@ kustomize build contrib/kserve/models-web-app/overlays/kubeflow | kubectl apply

- ../contrib/kserve/models-web-app/overlays/kubeflow


#### BentoML
ssheng marked this conversation as resolved.
Show resolved Hide resolved

BentoML allows you to package models trained in Kubeflow Notebooks and deploy them as microservices in Kubernetes.

Install the BentoML Yatai components:

```sh
kustomize build contrib/bentoml/bentoml-yatai-stack/default | kubectl apply -n kubeflow --server-side -f -
```

#### Katib

Install the Katib official Kubeflow component:
Expand Down
1 change: 1 addition & 0 deletions contrib/bentoml/OWNERS
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
approvers:
- yubozhao
- juliusvonkohout
ssheng marked this conversation as resolved.
Show resolved Hide resolved
201 changes: 47 additions & 154 deletions contrib/bentoml/README.md
Original file line number Diff line number Diff line change
@@ -1,193 +1,86 @@
# BentoML Yatai Stack
# BentoML on Kubeflow

[BentoML Yatai Stack](https://github.com/bentoml/yatai-deployment) is a series of components for deploying models/bentos to Kubernetes at scale
Starting with the release of Kubeflow 1.7, BentoML provides a native integration with Kubeflow through [Yatai](https://github.com/bentoml/yatai-deployment). This integration allows you to package models trained in Kubeflow Notebooks or Pipelines as [Bentos](https://docs.bentoml.org/en/latest/concepts/bento.html), and deploy them as microservices in a Kubernetes cluster through BentoML's cloud native components and custom resource definitions (CRDs). This documentation provides a comprehensive guide on how to use BentoML and Kubeflow together to streamline the process of deploying models at scale.

## Requirements

* Kubernetes 1.20 - 1.24
* Kubernetes 1.20 - 1.25
ssheng marked this conversation as resolved.
Show resolved Hide resolved

## Installation

* The yaml assumes you will install in kubeflow namespace
Run the following command to install BentoML Yatai. Note that the YAML assumes you will install in kubeflow namespace.

```bash
kustomize build bentoml-yatai-stack/default | kubectl apply -n kubeflow --server-side -f -
```

## Upgrating
## Customizations

See [UPGRADE.md](UPGRADE.md)

# Design Proposal

## Why BentoML

![image](https://user-images.githubusercontent.com/861225/212856116-bf873dc8-7da3-4484-9f33-e401e34a82dc.png)

- BentoML is an open-source framework for developing, serving, and deploying ML services.
- Building
- Unifies ML frameworks with out-of-the-box implementation of popular frameworks
- Exposes gRPC and OpenAPI for serving
- Provides Python SDK for development
- Deployment
- Any environment, batch inference, streaming, or online serving
- Any cloud platform for on-prem
- Full observability support through Grafana
- Yatai - BentoML's deployment platform

## User Stories

Goal: From simple Python module to distributed Kubernetes deployment.
You can customize the container repository configurations and credentials for the `yatai-image-builder` operator to push Bento images to a container registry of your choice.

Consider the following common ML services involve custom pre and post-processing logic and inference of multiple models.

![image](https://user-images.githubusercontent.com/861225/212856456-866125c8-2bf3-42d4-b031-3c7d89c07f37.png)

### Developing on Kubeflow Notebook
ssheng marked this conversation as resolved.
Show resolved Hide resolved

- Create a service using saved model.

```
%%writefile service.py
import asyncio
import bentoml

fraud_detection = bentoml.pytorch.get("fraud_detection:latest").to_runner()
risk_assessment_1 = bentoml.sklearn.get("risk_assessment_1:latest").to_runner()
risk_assessment_2 = bentoml.sklearn.get("risk_assessment_2:latest").to_runner()
risk_assessment_3 = bentoml.sklearn.get("risk_assessment_3:latest").to_runner()

svc = bentoml.Service(
name="credit_application",
runners=[fraud_detection, risk_assessment_1, risk_assessment_2, risk_assessment_3]
)

@svc.api(input=bentoml.io.JSON(), output=bentoml.io.JSON())
async def apply(input_data: dict) -> dict:
features = await fetch_features(input_date["user_id"])
detection = await fraud_detection.async_run(input_data, features)
if detection["confidence"] < CONFIDENCE_THRESHOLD:
return REJECTION
assessments = await asyncio.gather(
risk_assessment_1.async_run(input_data["application"], features),
risk_assessment_2.async_run(input_data["application"], features),
risk_assessment_3.async_run(input_data["application"], features),
)
return process_assessments(assessments)
WARNING: The `yatai-image-builder` operator requires root privileges because it needs to access the Docker daemon, which requires elevated permissions. Granting root privileges can potentially be dangerous, as it can give a user unrestricted access to the underlying operating system.

```

- Serve and test the service.

dockerRegistry:
bentoRepositoryName: yatai-bentos
inClusterServer: docker-registry.kubeflow.svc.cluster.local:5000
password: ""
secure: false
server: 127.0.0.1:5000
username: ""
```
!bentoml serve service.py:svc --reload

2022-11-07T06:50:53+0000 [INFO] [cli] Prometheus metrics for HTTP BentoServer from "service.py:svc" can be accessed at <http://localhost:3000/metrics>.
2022-11-07T06:50:53+0000 [INFO] [cli] Starting development HTTP BentoServer from "service.py:svc" listening on <http://0.0.0.0:3000> (Press CTRL+C to quit)
You can also supply AWS credentials for the `bento-image-builder` operator to download the Bento specified in the BentoRequest resource from S3.

```
![image](https://user-images.githubusercontent.com/861225/212856978-c8a24c4b-bc5b-4706-887e-81f5be914938.png)

- Build bento
aws:
accessKeyID: ''
secretAccessKey: ''
secretAccessKeyExistingSecretName: ''
secretAccessKeyExistingSecretKey: ''
```
!bentoml build

Building BentoML service "credit_application:wly5lqc6ncpzwcvj" from build context "."
Successfully built Bento(tag="credit_application:wly5lqc6ncpzwcvj").
```
Update the resources with the following command.

- Export bento to blob storage.

```
!bentoml export credit_application:wly5lqc6ncpzwcvj s3://your_bento_bucket/credit_application.wly5lqc6ncpzwcvj.bento
```bash
make bentoml-yatai-stack/bases
```

### Deploying to Kubernetes

![image](https://user-images.githubusercontent.com/861225/212857708-f96c9877-bb89-4afa-930a-1d2cb0300520.png)

Users can deploy bentos to the K8s cluster in one of the three ways.

#### Kubernetes Python Client

Users can deploy bentos through Kubeflow Notebook with Kubernetes [Python client](https://github.com/kubernetes-client/python)

#### kubectl

BentoML offers two options to deploy bentos directly to the Kubenetes cluster through `kubectl` and the `BentoRequest`, `Bento`, and `BentoDeployment` CRDs.

The first option relies on `yatai-image-builder` to build the OCI image. Users need to create a `BentoRequest` CR and `BentoDeployment` CR to deploy a bento. In the `BentoDeployment` CR, the name of the bento should be defined as the name of the `BentoRequest` CR. If this Bento CR not found, `yatai-deployment` will look for the BentoRequest CR by the same name and wait for the BentoRequest CR to generate the Bento CR. This option will build the OCI image by spawning a pod to run the Kaniko build tool. However, the Kaniko build tool requires root user access. If root user access is not available, consider the second option below.

The second option relies on the users to provide a URI to the pre-built OCI image of the bento. Users need to manually create a Bento CR with the image field defined as the pre-built OCI image URI. Then create a BentoDeployment CR to reference the Bento CR previously created.

#### Kubeflow Pipeline Component

This option will be available in Kubeflow release 1.8.

### Verification
Re-install and apply resources.

The following installation and testing steps demonstrate how to install Yatai components and deploy bentos through `kubectl` with `BentoRequest` and `BentoDeployment` CRDs.

#### Installation

Install with kustomize command:

```
```bash
kustomize build bentoml-yatai-stack/default | kubectl apply -n kubeflow --server-side -f -
```

#### Test

Create Bento CR and BentoDeployment CR:

```
kubectl apply -f example.yaml
```

Verifying that the bento deployment is running:

```
kubectl -n kubeflow get deploy -l yatai.ai/bento-deployment=test-yatai
```

The output of the above command should be like this:

```
NAME READY UP-TO-DATE AVAILABLE AGE
test-yatai 1/1 1 1 6m12s
test-yatai-runner-0 1/1 1 1 16m
```
## Upgrading

Verifying that the bento service is created:
See [UPGRADE.md](UPGRADE.md)

```
kubectl -n kubeflow get service -l yatai.ai/bento-deployment=test-yatai
```
## Why BentoML

The output of the above command should look like this:
[BentoML](https://github.com/bentoml/BentoML) is an open-source platform for building, shipping, and scaling AI applications.

```
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
test-yatai ClusterIP 10.96.150.42 <none> 3000/TCP,3001/TCP 7m59s
test-yatai-runner-32c50ece701351fb576189d54bd58724 ClusterIP 10.96.193.242 <none> 3000/TCP,3001/TCP 7m39s
```
- Building
- Unifies ML frameworks to run inference with any pre-trained models or bring your own
- Multi-model Inference graph support for complex AI solutions
- Python first framework that integrates with any ecosystem tooling
- Shipping
- Any environment, batch inference, streaming, or real-time serving
- Any public cloud for on-prem deployment
- Kubenetes native deployment
- Scaling
- Efficient resource utilization with autoscaling
- Adaptive batching for higher efficiency and throughput
- Distributed microservice architecture to run services on the most optimal hardware

Port-forwarding the bento service:
## Workflow on Kubeflow Notebook

```
kubectl -n kubeflow port-forward svc/test-yatai 3000:3000
```
In this example, we will train three fraud detection models using the Kubeflow notebook and the [Kaggle IEEE-CIS Fraud Detection dataset](https://www.kaggle.com/c/ieee-fraud-detection). We will then create a BentoML service that can simultaneously invoke all three models and return a decision on whether a transaction is fraudulent and build it into a Bento. We will showcase two deployment workflows using BentoML's Kubernetes operators: deploying directly from the Bento, and deploying from an OCI image built from the Bento.

Finally you can test the bento service with the curl command:
![image](https://raw.githubusercontent.com/bentoml/BentoML/main/docs/source/_static/img/kubeflow-fraud-detection.png)

```
curl -X 'POST' http://localhost:3000/classify -d '[[0,1,2,3]]'
```
See the [Fraud Detection Example](https://github.com/bentoml/BentoML/tree/main/examples/kubeflow) for a detailed workflow from model training to end-to-end deployment on Kubernetes.

The output should be:
## Workflow on Kubeflow Pipeline

```
[2]
```
This option will be available in Kubeflow release 1.8.
97 changes: 97 additions & 0 deletions contrib/bentoml/deployment_from_bento.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
apiVersion: resources.yatai.ai/v1alpha1
kind: Bento
metadata:
name: fraud-detection
namespace: kubeflow
spec:
image: docker.io/bentoml/fraud_detection:o5smnagbncigycvj
runners:
- name: ieee-fraud-detection-0
runnableType: XGBoost
- name: ieee-fraud-detection-1
runnableType: XGBoost
- name: ieee-fraud-detection-2
ssheng marked this conversation as resolved.
Show resolved Hide resolved
runnableType: XGBoost
tag: fraud_detection:o5smnagbncigycvj
---
apiVersion: serving.yatai.ai/v2alpha1
kind: BentoDeployment
metadata:
name: fraud-detection
namespace: kubeflow
spec:
autoscaling:
maxReplicas: 2
metrics:
- resource:
name: cpu
target:
averageUtilization: 80
type: Utilization
type: Resource
minReplicas: 1
bento: fraud-detection
ingress:
enabled: false
resources:
limits:
cpu: 1000m
memory: 1024Mi
requests:
cpu: 100m
memory: 200Mi
runners:
- autoscaling:
maxReplicas: 2
metrics:
- resource:
name: cpu
target:
averageUtilization: 80
type: Utilization
type: Resource
minReplicas: 1
name: ieee-fraud-detection-0
resources:
limits:
cpu: 1000m
memory: 1024Mi
requests:
cpu: 100m
memory: 200Mi
- autoscaling:
maxReplicas: 2
metrics:
- resource:
name: cpu
target:
averageUtilization: 80
type: Utilization
type: Resource
minReplicas: 1
name: ieee-fraud-detection-1
resources:
limits:
cpu: 1000m
memory: 1024Mi
requests:
cpu: 100m
memory: 200Mi
- autoscaling:
maxReplicas: 2
metrics:
- resource:
name: cpu
target:
averageUtilization: 80
type: Utilization
type: Resource
minReplicas: 1
name: ieee-fraud-detection-2
resources:
limits:
cpu: 1000m
memory: 1024Mi
requests:
cpu: 100m
memory: 200Mi
Loading