Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README.md: add KEDA operator description for integration #160

Merged
merged 1 commit into from
Oct 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 34 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
![CI Check Shell](https://github.com/intel/cloud-native-ai-pipeline/actions/workflows/pr-shell-check.yaml/badge.svg)
![CI Check Node](https://github.com/intel/cloud-native-ai-pipeline/actions/workflows/pr-node-check.yaml/badge.svg)


## 1. Overview

This project provides a multiple-stream, real-time inference pipeline based on cloud native design pattern as following architecture
Expand Down Expand Up @@ -109,3 +108,37 @@ Please refer to the [script](./tools/helm_manager.sh) source code for more detai
The dashboard of CNAP will be available at `http://<your-ip>:31002`, it is exposed as a NodePort service in kubernetes.

**Note**: This is pre-release/prototype software and, as such, it may be substantially modified as updated versions are made available.

## 5. Integration

The Cloud Native AI Pipeline incorporates several key technologies to foster a robust, scalable, and insightful environment conducive for cloud-native deployments. Our integration encompasses monitoring, visualization, and event-driven autoscaling to ensure optimized performance and efficient resource utilization.

### Monitoring with Prometheus

Our project is instrumented to expose essential metrics to [Prometheus](https://prometheus.io/), a reliable monitoring solution that aggregates and stores metric data. This metric exposition forms the basis for informed autoscaling decisions, ensuring our system dynamically adapts to workload demands.

Note that, when you want to deploy the workloads into other namespace, please first patch the Prometheus RoleBinding to grant the permission to access the workloads in other namespace:

```bash
kubectl apply -f ./k8s-manifests/prometheus/ClusterRole-All.yaml
```

### Visualization with Grafana

[Grafana](https://grafana.com/) is employed to provide visual insights into the system's performance and the efficacy of the autoscaling integration. Through intuitive dashboards, we can monitor and analyze the metrics collected by Prometheus, fostering a transparent and insightful monitoring framework.

The dashboards of this project is available at `./k8s-manifests/grafana/dashboards`, you can import it into your grafana.

### Event-Driven Autoscaling with KEDA

[Kubernetes Event-driven Autoscaling (KEDA)](https://keda.sh/) is integrated as an operator to orchestrate the dynamic scaling of our Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) based on the metrics collected by Prometheus. This synergy ensures that resources are efficiently allocated in real-time, aligning with the fluctuating workload demands, thus embodying the essence of cloud-native scalability.

As our project evolves, we envisage the integration of additional technologies to further enhance the cloud-native capabilities of our AI pipeline. For a deeper dive into the current integration and instructions on configuration and usage, refer to the [Integration Documentation](./docs/KEDA.md).

To integrate KEDA with Prometheus, you need to deploy the Service Monitor CR for KEDA:

```bash
kubectl apply -f ./k8s-manifests/keda/keda-service-monitor.yaml
```

And an example of KEDA ScaledObject is available at `./k8s-manifests/keda/infer_scale.yaml`, you can deploy it to your kubernetes cluster to scale the workloads.
68 changes: 68 additions & 0 deletions docs/KEDA.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# KEDA Integration Documentation

This document delineates the integration of Kubernetes Event-driven Autoscaling (KEDA) within the Cloud Native AI Pipeline, specifically focusing on augmenting the scalability of the Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) which are imperative for optimizing resource allocation in cloud-native settings.

## Table of Contents

- [Overview](#overview)
- [Installation](#installation)
- [Configuration](#configuration)
- [Usage](#usage)
- [Resources](#resources)

## Overview

KEDA, acting as an operator, synergizes with the Cloud Native AI Pipeline to bolster the scalability of HPA and VPA, ensuring efficient resource allocation and optimized performance in response to real-time workload demands.

## Installation

### Prerequisites

- Kubernetes cluster
- Helm 3

### Steps

1. Install KEDA using Helm:
```bash
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda --namespace keda
```

## Configuration

Configure the scalers and triggers in accordance with the project requirements to fine-tune the autoscaling behavior.

1. Define the ScaledObject or ScaledJob custom resource:

```yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: example-scaledobject
spec:
scaleTargetRef:
name: example-deployment
triggers:
- type: example-trigger
metadata:
# trigger-specific configuration
```

## Usage

Utilize KEDA to orchestrate the autoscaling of HPA and VPA within the project, ensuring real-time scalability in response to workload dynamics.

### Monitor the autoscaling behavior:

```bash
kubectl get hpa
```

Or you can just deploy grafana dashboard for KEDA in grafana and directly monitor the autoscaling behavior.

## Resources

- [KEDA Official Documentation](https://keda.sh/docs/)
- Additional resources and references pertinent to the Cloud Native AI Pipeline and KEDA integration.
Loading