kubeflow · google-oss-prow · Apr 7, 2023 · Mar 21, 2023 · Mar 21, 2023 · Mar 21, 2023
diff --git a/README.md b/README.md
@@ -58,6 +58,7 @@ This repo periodically syncs all official Kubeflow components from their respect
 | KServe Models Web App | contrib/kserve/models-web-app | [v0.10.0](https://github.com/kserve/models-web-app/tree/v0.10.0/config) |
 | Kubeflow Pipelines | apps/pipeline/upstream | [2.0.0-alpha.7](https://github.com/kubeflow/pipelines/tree/2.0.0-alpha.7/manifests/kustomize) |
 | Kubeflow Tekton Pipelines | apps/kfp-tekton/upstream | [v1.5.1](https://github.com/kubeflow/kfp-tekton/tree/v1.5.1/manifests/kustomize) |
+| BentoML | contrib/bentoml/bentoml-yatai-stack/default | [v1.7.0](https://github.com/ssheng/manifests/tree/master/contrib/bentoml/bentoml-yatai-stack/default) |
 
 The following is also a matrix with versions from common components that are
 used from the different projects of Kubeflow:
@@ -293,6 +294,17 @@ kustomize build contrib/kserve/models-web-app/overlays/kubeflow | kubectl apply
 
 - ../contrib/kserve/models-web-app/overlays/kubeflow
 
+
+#### BentoML
+
+BentoML allows you to package models trained in Kubeflow Notebooks and deploy them as microservices in Kubernetes.
+
+Install the BentoML Yatai components:
+
+```sh
+kustomize build contrib/bentoml/bentoml-yatai-stack/default | kubectl apply -n kubeflow --server-side -f -
+```
+
 #### Katib
 
 Install the Katib official Kubeflow component:

diff --git a/contrib/bentoml/OWNERS b/contrib/bentoml/OWNERS
@@ -1,2 +1,3 @@
 approvers:
   - yubozhao
+  - juliusvonkohout
diff --git a/contrib/bentoml/README.md b/contrib/bentoml/README.md
@@ -1,193 +1,86 @@
-# BentoML Yatai Stack
+# BentoML on Kubeflow
 
-[BentoML Yatai Stack](https://github.com/bentoml/yatai-deployment) is a series of components for deploying models/bentos to Kubernetes at scale
+Starting with the release of Kubeflow 1.7, BentoML provides a native integration with Kubeflow through [Yatai](https://github.com/bentoml/yatai-deployment). This integration allows you to package models trained in Kubeflow Notebooks or Pipelines as [Bentos](https://docs.bentoml.org/en/latest/concepts/bento.html), and deploy them as microservices in a Kubernetes cluster through BentoML's cloud native components and custom resource definitions (CRDs). This documentation provides a comprehensive guide on how to use BentoML and Kubeflow together to streamline the process of deploying models at scale.
 
 ## Requirements
 
-* Kubernetes 1.20 - 1.24
+* Kubernetes 1.20 - 1.25
 
 ## Installation
 
-    * The yaml assumes you will install in kubeflow namespace
+Run the following command to install BentoML Yatai. Note that the YAML assumes you will install in kubeflow namespace.
 
 ```bash
 kustomize build bentoml-yatai-stack/default | kubectl apply -n kubeflow --server-side -f -
 ```
 
-## Upgrating
+## Customizations
 
-See [UPGRADE.md](UPGRADE.md)
-
-# Design Proposal
-
-## Why BentoML
-
-![image](https://user-images.githubusercontent.com/861225/212856116-bf873dc8-7da3-4484-9f33-e401e34a82dc.png)
-
-- BentoML is an open-source framework for developing, serving, and deploying ML services.
-    - Building
-        - Unifies ML frameworks with out-of-the-box implementation of popular frameworks
-        - Exposes gRPC and OpenAPI for serving
-        - Provides Python SDK for development
-    - Deployment
-        - Any environment, batch inference, streaming, or online serving
-        - Any cloud platform for on-prem
-        - Full observability support through Grafana
-        - Yatai - BentoML's deployment platform
-
-## User Stories
-
-Goal: From simple Python module to distributed Kubernetes deployment.
+You can customize the container repository configurations and credentials for the `yatai-image-builder` operator to push Bento images to a container registry of your choice.
 
-Consider the following common ML services involve custom pre and post-processing logic and inference of multiple models.
-
-![image](https://user-images.githubusercontent.com/861225/212856456-866125c8-2bf3-42d4-b031-3c7d89c07f37.png)
-
-### Developing on Kubeflow Notebook
-
-- Create a service using saved model.
-
-```
-%%writefile service.py
-import asyncio
-import bentoml
-
-fraud_detection = bentoml.pytorch.get("fraud_detection:latest").to_runner()
-risk_assessment_1 = bentoml.sklearn.get("risk_assessment_1:latest").to_runner()
-risk_assessment_2 = bentoml.sklearn.get("risk_assessment_2:latest").to_runner()
-risk_assessment_3 = bentoml.sklearn.get("risk_assessment_3:latest").to_runner()
-
-svc = bentoml.Service(
-    name="credit_application",
-    runners=[fraud_detection, risk_assessment_1, risk_assessment_2, risk_assessment_3]
-)
-
-@svc.api(input=bentoml.io.JSON(), output=bentoml.io.JSON())
-async def apply(input_data: dict) -> dict:
-    features = await fetch_features(input_date["user_id"])
-    detection = await fraud_detection.async_run(input_data, features)
-    if detection["confidence"] < CONFIDENCE_THRESHOLD:
-       return REJECTION
-    assessments = await asyncio.gather(
-        risk_assessment_1.async_run(input_data["application"], features),
-        risk_assessment_2.async_run(input_data["application"], features),
-        risk_assessment_3.async_run(input_data["application"], features),
-    )
-    return process_assessments(assessments)
+WARNING: The `yatai-image-builder` operator requires root privileges because it needs to access the Docker daemon, which requires elevated permissions. Granting root privileges can potentially be dangerous, as it can give a user unrestricted access to the underlying operating system.
 
 ```
-
-- Serve and test the service.
-
+dockerRegistry:
+  bentoRepositoryName: yatai-bentos
+  inClusterServer: docker-registry.kubeflow.svc.cluster.local:5000
+  password: ""
+  secure: false
+  server: 127.0.0.1:5000
+  username: ""
 ```
-!bentoml serve service.py:svc --reload
 
-2022-11-07T06:50:53+0000 [INFO] [cli] Prometheus metrics for HTTP BentoServer from "service.py:svc" can be accessed at <http://localhost:3000/metrics>.
-2022-11-07T06:50:53+0000 [INFO] [cli] Starting development HTTP BentoServer from "service.py:svc" listening on <http://0.0.0.0:3000> (Press CTRL+C to quit)
+You can also supply AWS credentials for the `bento-image-builder` operator to download the Bento specified in the BentoRequest resource from S3.
 
 ```
-    
-![image](https://user-images.githubusercontent.com/861225/212856978-c8a24c4b-bc5b-4706-887e-81f5be914938.png)
-
-- Build bento
-    
+aws:
+  accessKeyID: ''
+  secretAccessKey: ''
+  secretAccessKeyExistingSecretName: ''
+  secretAccessKeyExistingSecretKey: ''
 ```
-!bentoml build
 
-Building BentoML service "credit_application:wly5lqc6ncpzwcvj" from build context "."
-Successfully built Bento(tag="credit_application:wly5lqc6ncpzwcvj").
-```
+Update the resources with the following command.
 
-- Export bento to blob storage.
-
-```
-!bentoml export credit_application:wly5lqc6ncpzwcvj s3://your_bento_bucket/credit_application.wly5lqc6ncpzwcvj.bento
+```bash
+make bentoml-yatai-stack/bases
 ```
 
-### Deploying to Kubernetes
-
-![image](https://user-images.githubusercontent.com/861225/212857708-f96c9877-bb89-4afa-930a-1d2cb0300520.png)
-
-Users can deploy bentos to the K8s cluster in one of the three ways.
-
-#### Kubernetes Python Client
-
-Users can deploy bentos through Kubeflow Notebook with Kubernetes [Python client](https://github.com/kubernetes-client/python)
-
-#### kubectl
-
-BentoML offers two options to deploy bentos directly to the Kubenetes cluster through `kubectl` and the `BentoRequest`, `Bento`, and `BentoDeployment` CRDs.
-
-The first option relies on `yatai-image-builder` to build the OCI image. Users need to create a `BentoRequest` CR and `BentoDeployment` CR to deploy a bento. In the `BentoDeployment` CR, the name of the bento should be defined as the name of the `BentoRequest` CR. If this Bento CR not found, `yatai-deployment` will look for the BentoRequest CR by the same name and wait for the BentoRequest CR to generate the Bento CR. This option will build the OCI image by spawning a pod to run the Kaniko build tool. However, the Kaniko build tool requires root user access. If root user access is not available, consider the second option below.
-
-The second option relies on the users to provide a URI to the pre-built OCI image of the bento. Users need to manually create a Bento CR with the image field defined as the pre-built OCI image URI. Then create a BentoDeployment CR to reference the Bento CR previously created.
-
-#### Kubeflow Pipeline Component
-
-This option will be available in Kubeflow release 1.8.
-
-### Verification
+Re-install and apply resources.
 
-The following installation and testing steps demonstrate how to install Yatai components and deploy bentos through `kubectl` with `BentoRequest` and `BentoDeployment` CRDs.
-
-#### Installation
-
-Install with kustomize command:
-
-```
+```bash
 kustomize build bentoml-yatai-stack/default | kubectl apply -n kubeflow --server-side -f -
 ```
 
-#### Test
-
-Create Bento CR and BentoDeployment CR:
-
-```
-kubectl apply -f example.yaml
-```
-
-Verifying that the bento deployment is running:
-
-```
-kubectl -n kubeflow get deploy -l yatai.ai/bento-deployment=test-yatai
-```
-
-The output of the above command should be like this:
-
-```
-NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
-test-yatai            1/1     1            1           6m12s
-test-yatai-runner-0   1/1     1            1           16m
-```
+## Upgrading
 
-Verifying that the bento service is created:
+See [UPGRADE.md](UPGRADE.md)
 
-```
-kubectl -n kubeflow get service -l yatai.ai/bento-deployment=test-yatai
-```
+## Why BentoML
 
-The output of the above command should look like this:
+[BentoML](https://github.com/bentoml/BentoML) is an open-source platform for building, shipping, and scaling AI applications.
 
-```
-NAME                                                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
-test-yatai                                           ClusterIP   10.96.150.42    <none>        3000/TCP,3001/TCP   7m59s
-test-yatai-runner-32c50ece701351fb576189d54bd58724   ClusterIP   10.96.193.242   <none>        3000/TCP,3001/TCP   7m39s
-```
+- Building
+    - Unifies ML frameworks to run inference with any pre-trained models or bring your own
+    - Multi-model Inference graph support for complex AI solutions
+    - Python first framework that integrates with any ecosystem tooling
+- Shipping
+    - Any environment, batch inference, streaming, or real-time serving
+    - Any public cloud for on-prem deployment
+    - Kubenetes native deployment
+- Scaling
+    - Efficient resource utilization with autoscaling
+    - Adaptive batching for higher efficiency and throughput
+    - Distributed microservice architecture to run services on the most optimal hardware
 
-Port-forwarding the bento service:
+## Workflow on Kubeflow Notebook
 
-```
-kubectl -n kubeflow port-forward svc/test-yatai 3000:3000
-```
+In this example, we will train three fraud detection models using the Kubeflow notebook and the [Kaggle IEEE-CIS Fraud Detection dataset](https://www.kaggle.com/c/ieee-fraud-detection). We will then create a BentoML service that can simultaneously invoke all three models and return a decision on whether a transaction is fraudulent and build it into a Bento. We will showcase two deployment workflows using BentoML's Kubernetes operators: deploying directly from the Bento, and deploying from an OCI image built from the Bento.
 
-Finally you can test the bento service with the curl command:
+![image](https://raw.githubusercontent.com/bentoml/BentoML/main/docs/source/_static/img/kubeflow-fraud-detection.png)
 
-```
-curl -X 'POST' http://localhost:3000/classify -d '[[0,1,2,3]]'
-```
+See the [Fraud Detection Example](https://github.com/bentoml/BentoML/tree/main/examples/kubeflow) for a detailed workflow from model training to end-to-end deployment on Kubernetes. 
 
-The output should be:
+## Workflow on Kubeflow Pipeline
 
-```
-[2]
-```
+This option will be available in Kubeflow release 1.8.
diff --git a/contrib/bentoml/deployment_from_bento.yaml b/contrib/bentoml/deployment_from_bento.yaml
@@ -0,0 +1,97 @@
+apiVersion: resources.yatai.ai/v1alpha1
+kind: Bento
+metadata:
+  name: fraud-detection
+  namespace: kubeflow
+spec:
+  image: docker.io/bentoml/fraud_detection:o5smnagbncigycvj
+  runners:
+  - name: ieee-fraud-detection-0
+    runnableType: XGBoost
+  - name: ieee-fraud-detection-1
+    runnableType: XGBoost
+  - name: ieee-fraud-detection-2
+    runnableType: XGBoost
+  tag: fraud_detection:o5smnagbncigycvj
+---
+apiVersion: serving.yatai.ai/v2alpha1
+kind: BentoDeployment
+metadata:
+  name: fraud-detection
+  namespace: kubeflow
+spec:
+  autoscaling:
+    maxReplicas: 2
+    metrics:
+    - resource:
+        name: cpu
+        target:
+          averageUtilization: 80
+          type: Utilization
+      type: Resource
+    minReplicas: 1
+  bento: fraud-detection
+  ingress:
+    enabled: false
+  resources:
+    limits:
+      cpu: 1000m
+      memory: 1024Mi
+    requests:
+      cpu: 100m
+      memory: 200Mi
+  runners:
+  - autoscaling:
+      maxReplicas: 2
+      metrics:
+      - resource:
+          name: cpu
+          target:
+            averageUtilization: 80
+            type: Utilization
+        type: Resource
+      minReplicas: 1
+    name: ieee-fraud-detection-0
+    resources:
+      limits:
+        cpu: 1000m
+        memory: 1024Mi
+      requests:
+        cpu: 100m
+        memory: 200Mi
+  - autoscaling:
+      maxReplicas: 2
+      metrics:
+      - resource:
+          name: cpu
+          target:
+            averageUtilization: 80
+            type: Utilization
+        type: Resource
+      minReplicas: 1
+    name: ieee-fraud-detection-1
+    resources:
+      limits:
+        cpu: 1000m
+        memory: 1024Mi
+      requests:
+        cpu: 100m
+        memory: 200Mi
+  - autoscaling:
+      maxReplicas: 2
+      metrics:
+      - resource:
+          name: cpu
+          target:
+            averageUtilization: 80
+            type: Utilization
+        type: Resource
+      minReplicas: 1
+    name: ieee-fraud-detection-2
+    resources:
+      limits:
+        cpu: 1000m
+        memory: 1024Mi
+      requests:
+        cpu: 100m
+        memory: 200Mi