Skip to content

Commit

Permalink
docs: add featinsight english documentation (#3719)
Browse files Browse the repository at this point in the history
* Update intro and quickstart

* Update screenshot and typo

* Update index.rst

* update sub-folder index

* Fix typos

* Add Installation folder

* Update ZH typo for install folder

* Add missing picture

* Add use_case content

* Add screenshots

* Update index.rst

* Add faq

* Fix typos

* fix introduction.md typo

* Update introduction.md typo

* Update quickstart.md
  • Loading branch information
Elliezza authored Feb 22, 2024
1 parent e581e54 commit 67031d2
Show file tree
Hide file tree
Showing 53 changed files with 616 additions and 95 deletions.
38 changes: 38 additions & 0 deletions docs/en/app_ecosystem/feat_insight/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Frequently Asked Questions

## What are the differences between FeatInsight and mainstream Feature Stores?

Mainstream Feature Stores, such as Feast, Tecton, Feathr, provide feature management and computation capabilities, with online storage mainly using pre-aggregated key-value stores like Redis. FeatInsight provides real-time feature computation capabilities, and feature extraction solutions can be directly deployed with a single click without the need to re-deploy and synchronize online data. The main feature comparisons are as follows.

| Feature Store System | Feast | Tecton | Feathr | FeatInsight |
| --------------------------| ------------------ | ----------------- | ----------------- | ----------------- |
| Data Source Support | Multiple data sources | Multiple data sources | Multiple data sources | Multiple data sources |
| Scalability | High | High | Medium to High | High |
| Real-time Feature Service | Supported | Supported | Supported | Supported |
| Batch Feature Service | Supported | Supported | Supported | Supported |
| Feature Transformation | Basic transformations supported | Complex transformations and SQL supported | Complex transformations supported | Complex transformations and SQL supported |
| Data Storage | Multiple storage options supported | Mainly supports cloud storage | Multiple storage options supported | Built-in high-performance time-series database, supports multiple storage options |
| Community and Support | Open-source community | Commercial support | Open-source community | Open-source community |
| Real-time Feature Computation | Not supported | Not supported | Not supported | Supported |

## Is it necessary to have OpenMLDB for deploying FeatInsight?

Yes, it is necessary because FeatInsight's metadata storage and feature computation rely on the OpenMLDB cluster. Therefore, deploying FeatInsight requires deployment of the OpenMLDB cluster. You can also use the [All-in-One Docker image](./install/docker.md) that integrates both for one-click deployment.

After using FeatInsight, users can develop and deploy features without relying on OpenMLDB CLI or SDK. All feature engineering needs can be completed through the web interface.

## How can I implement MLOps workflows using FeatInsight?

With FeatInsight, you can create databases and tables in the frontend, then submit the import tasks for online and offline data. Use OpenMLDB SQL syntax for data exploration and feature creation. You can then export offline features and deploy online features with just one click. There is no need for any additional development work to transition from offline to online in the MLOps process. For detailed steps, refer to the [Quickstart](./quickstart.md).

## How does FeatInsight support ecosystem integration?

FeatInsight relies on the OpenMLDB ecosystem and supports integration with other components in the OpenMLDB ecosystem.

For example, integration with data integration components in the OpenMLDB ecosystem supports [Kafka](../../integration/online_datasources/kafka_connector_demo.md)[Pulsar](../../integration/online_datasources/pulsar_connector_demo.md)[RocketMQ](../../integration/online_datasources/rocketmq_connector.md)[Hive](../../integration/offline_data_sources/hive.md)[Amazon S3](../../integration/offline_data_sources/s3.md). For scheduling systems, it supports [Airflow](../../integration/deploy_integration/airflow_provider_demo.md)[DolphinScheduler](../../integration/deploy_integration/dolphinscheduler_task_demo.md)[Byzer](../../integration/deploy_integration/OpenMLDB_Byzer_taxi.md), etc. It also provides a certain degree of support for Spark Connector supporting HDFS, Iceberg, and cloud-related technologies like Kubernetes, Alibaba Cloud MaxCompute, etc.

## What is the business value and technical complexity of FeatInsight?

Compared to simple Feature Stores using HDFS for storing offline data and Redis for storing online data, FeatInsight's value lies in using the online-offline consistent feature extraction language of OpenMLDB SQL. For feature development scientists, they only need to write SQL logic to define features. In offline scenarios, this SQL will be translated into a distributed Spark application for execution. In online scenarios, the same SQL will be translated into query statements for an online time-series database for execution, achieving consistency between online and offline feature computations.

Currently, the SQL compiler, online storage engine, and offline computing engine are all implemented based on programming languages such as C++ and Scala. For scientists without a technical background, using SQL language to define the feature development process can reduce learning costs and improve development efficiency. All the code is open-source and available, with the OpenMLDB project at https://github.com/4paradigm/openmldb and the FeatInsight project at https://github.com/4paradigm/FeatInsight.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 12 additions & 0 deletions docs/en/app_ecosystem/feat_insight/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
=============================
FeatInsight
=============================

.. toctree::
:maxdepth: 1

introduction
quickstart
install/index
use_cases/index
faq
30 changes: 30 additions & 0 deletions docs/en/app_ecosystem/feat_insight/install/config_file.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# FeatInsight Configuration File

## Introduction

FeatInsight is developed based on Spring Boot. It uses the standard `application.yml` as configuration file.

## Example Configuration

A simplified configuration file example is as follows:

```
server:
port: 8888
openmldb:
zk_cluster: 127.0.0.1:2181
zk_path: /openmldb
apiserver: 127.0.0.1:9080
```

## Configuration Items


| Item | Definition | Type | Example |
| --------------------------| --------------------------- | ------- | -------------- |
| server.port | port for service | int | 8888 |
| openmldb.zk_cluster | ZooKeeper address | string | 127.0.0.1:2181 |
| openmldb.zk_path | OpenMLDB root path | string | /openmldb |
| openmldb.apiserver | OpenMLDB APIServer address | string | 127.0.0.1:9080 |
| openmldb.skip_index_check | whether to skip index check | boolean | false |
47 changes: 47 additions & 0 deletions docs/en/app_ecosystem/feat_insight/install/docker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Docker

## Introduction

User official Docker image for quick deployment of FeatInsight feature services.

## All-in-One Image

With All-in-One image which contains a automatic OpenMLDB deployment, you can start both the OpenMLDB cluster and FeatInsight at the same time. No additional actions are required.

```
docker run -d -p 8888:8888 registry.cn-shenzhen.aliyuncs.com/tobe43/portable-openmldb
```

It takes around one minute to start. You can check the logs through `docker logs`.

After successful start-up, you can access FeatInsight service with any web browser at `http://127.0.0.1:8888`.

## Docker Image without OpenMLDB

With this image, you need to deploy a OpenMLDB cluster in advance, and then start this FeatInsight docker container. There are more steps but it offers higher flexibility.

Please refer to [OpenMLDB Deployment](../../../deploy/index.rst) to deploy a OpenMLDB cluster.

Then, refer to [FeatInsight Configuration File](./config_file.md) to create an `application.yml` configuration file.

```
server:
port: 8888
openmldb:
zk_cluster: 127.0.0.1:2181
zk_path: /openmldb
apiserver: 127.0.0.1:9080
```

For Linux OS, use the following command to start the container.

```
docker run -d -p 8888:8888 --net=host -v `pwd`/application.yml:/app/application.yml registry.cn-shenzhen.aliyuncs.com/tobe43/featinsight
```

For MacOS, since virtual machine is used to start Docker container, `--net=host` is not working properly, please configure `application.yml` to point to OpenMLDB service addresses correctly.

```
docker run -d -p 8888:8888 -v `pwd`/application.yml:/app/application.yml registry.cn-shenzhen.aliyuncs.com/tobe43/featinsight
```
13 changes: 13 additions & 0 deletions docs/en/app_ecosystem/feat_insight/install/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
=============================
Installation and Deployment
=============================

.. toctree::
:maxdepth: 1

docker
package
source
config_file
upgrade

37 changes: 37 additions & 0 deletions docs/en/app_ecosystem/feat_insight/install/package.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Installation Package

## Introduction
You can deploy FeatInsight quickly with official pre-built installation package and Java environment.

Note that you need to deploy OpenMLDB cluster first, refer to [OpenMLDB Deployment](../../../deploy/index.rst).

## Download

Download Jar file.

```
wget https://openmldb.ai/download/featinsight/featinsight-0.1.0-SNAPSHOT.jar
```

## Configuration

Refer to [FeatInsight Configuration](./config_file.md) to create an `application.yml` configuration file.

```
server:
port: 8888
openmldb:
zk_cluster: 127.0.0.1:2181
zk_path: /openmldb
apiserver: 127.0.0.1:9080
```

## Start

Start FeatInsight service.

```
java -jar ./featinsight-0.1.0-SNAPSHOT.jar
```

40 changes: 40 additions & 0 deletions docs/en/app_ecosystem/feat_insight/install/source.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Build from Source

## Introduction

You can build FeatInsight from source code as required.

## Download

Download project source code.

```
git clone https://github.com/4paradigm/FeatInsight
```

## Compile from Source

Enter project root directory, execute the following command to compile frontend and backend.

```
cd ./FeatInsight/frontend/
npm run build
cd ../
mvn clean package
```

## Start

Deploy OpenMLDB cluster and generate configuration file, start the service with the following command.


```
./start_server.sh
```

## IDE

If you are developing with IDE, you can modify `application.yml` configuration file, and directly start `HtttpServer.java`.

![](../images/ide_develop_featuer_platform.png)
10 changes: 10 additions & 0 deletions docs/en/app_ecosystem/feat_insight/install/upgrade.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Version Upgrade

## Introduction

FeatInsight provides an HTTP interface to the external users, relying on the OpenMLDB database for storing metadata. Therefore, version upgrades can be carried out using methods like multiple instances and rolling updates.

## Single Instance Upgrade Steps
1. Download the new installation package or Docker image.
2. Stop the currently running instance of FeatInsight.
3. Start a new instance with the new FeatInsight package.
41 changes: 41 additions & 0 deletions docs/en/app_ecosystem/feat_insight/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Introduction

FeatInsight is a sophisticated feature store service, leveraging [OpenMLDB](https://github.com/4paradigm/OpenMLDB) for efficient feature computation, management, and orchestration.

FeatInsight provides a user-friendly user interface, allowing users to perform the entire process of feature engineering for machine learning, including data import, viewing and update, feature generation, store, and online deployment. For offline scenarios, users can choose features for training sample generation for ML training; for online scenarios, users can deploy online feature services for real-time feature computations.

![](./images/bigscreen.png)

## Main Functionalities

FeatInsight includes the following major functionalities:

- [Data Management](./functions/import_data.md): To import and manage datasets and online data sources for feature engineering.
- [Feature Management](./functions/manage_feature.md): To store original features and generated features.
- [Online Scenario](./functions/online_scenario.md): To deploy feature services online, which provides hard real-time online feature extraction APIs using online data.
- [Offline Scenario](./functions/offline_scenario.md): To generate training dataset from offline data and corresponding feature calculations. It also provides management functions for offline datasets and offline tasks.
- [SQL Playground](./functions/sql_playground.md): To execute any OpenMLDB SQL statements. It can be used in both online and offline mode for feature calculations.
- [Computed Features](./functions/computed_features.md): To store pre-computed features directly into OpenMLDB online tables, for access to perform feature reads and writes.

## Key Features

The main objective of FeatInsight is to address common challenges in machine learning development, including facilitating easy and quick feature extraction, transformation, combination, and selection, managing feature lineage, enabling feature reuse and sharing, version control for feature services, and ensuring consistency and reliability of feature data used in both training and inference processes. Application scenarios include the following:

* Online Feature Service Deployment: Provides high-performance feature storage and online feature computation functions for localized deployment.
* MLOps Platform: Establishes MLOps workflow with OpenMLDB online-offline consistent computations.
* FeatureStore Platform: Provides comprehensive feature extraction, deletion, online deployment, and lineage management functionality to achieve low-cost local FeatureStore services.
* Open-Source Feature Solution Reuse: Supports solution reuse locally for feature reuse and sharing.
* Business Component for Machine Learning: Provides a one-stop feature engineering solution for machine learning models in recommendation systems, natural language processing, finance, healthcare, and other areas of machine learning implementation.


## Core Concepts

Here are some terms and their definitions used in FeatInsight for better understanding:

* Feature: Data obtained through feature extraction from raw data that can be directly used for model training and inference.
* Pre-computed Feature: Feature values stored after external batch computation or streaming processing, available for direct online use.
* Feature View: A set of features defined by a single SQL computation statement.
* Feature Service: Combines one or more features into a feature service, provided for use in online scenarios.
* Online Scenario: By deploying feature services, it provides hard real-time online feature extraction interfaces using online data.
* Offline Scenario: With distributed computing, performs feature computation on offline data and exports training dataset for machine learning.
* Online-Offline Consistency: The consistency in feature results between online and offline scenarios is ensured through the same SQL statement.
112 changes: 112 additions & 0 deletions docs/en/app_ecosystem/feat_insight/quickstart.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Quickstart

We will use a simple example to show how to use FeatInsight to perform feature engineering.

The installation and deployment, you can refer to [OpenMLDB Deployment](../../../deploy/index.rst) and [FeatInsight Deployment](./install/index.rst).

## Usage

The major steps to use FeatInsight includes the following:

1. Data Import: Use SQL or frontend form to create database, data table, import online data, import offline data.
2. Feature Creation: Use SQL to define a feature view, and FeatInsight will use SQL compiler to analyze and create corresponding features.
3. Offline Scenarios: Choose features to import (features from different feature views can be chosen), and export training dataset through distributed computing into local or distributed storage.
4. Online Scenarios: Choose features for deployment, and deploy them as online feature extraction services. The service then can be accessed through HTTP client to retrieve online feature extraction results.

### 1. Data Import

Firstly, create database `test_db` and data table `test_table`. You can use SQL to create.

```
CREATE DATABASE test_db;
CREATE TABLE test_db.test_table (id STRING, trx_time DATE);
```

Or you can use the UI and create it under "Data Import".

![](./images/create_test_table.png)

For easier testing, we prepare a CSV file and save it to `/tmp/test_table.csv`. Note that, this path is a local path for the machine that runs the OpenMLDB TaskManager, usually also the machine for FeatInsight. You will need access to the machine for the edition.

```
id,trx_time
user1,2024-01-01
user2,2024-01-02
user3,2024-01-03
user4,2024-01-04
user5,2024-01-05
user6,2024-01-06
user7,2024-01-07
```

For online scenarios, you can use the command `LOAD DATA` or `INSERT`. Here we use "Import from CSV".

![](./images/online_csv_import_test_table.png)

The imported data can be previewed.

![](./images/preview_test_table.png)

For offline scenarios, you can also use `LOAD_DATA` or "Import from CSV".

![](./images/csv_import_test_table.png)

Wait for about half a minute for the task to finish. You can also check the status and log.

![](./images/import_job_result.png)

### 2. Feature Creation

After data imports, we can create features. Here we use SQL to create two basic features.

```
SELECT id, dayofweek(trx_time) as trx_day FROM test_table
```

In "Features", the button beside "All Features" is to create new features. Fill in the form accordingly.

![](./images/create_test_featureview.png)

After successful creation, you can check the features. Click on the name to go into details. You can check the basic information, as well as preview feature values.

![](./images/preview_test_features.png)


### 3. Offline Samples Export

In "Offline Scenario", you can choose to export offline samples. You can choose the features to export and specify the export path. There are "More Options" for you to specify the file format and other advanced parameters.

![](./images/export_test_offline_samples.png)

Wait for about half a minute and you can check the status at "Offline Samples".

![](./images/test_offline_sample_detail.png)

You can check the content of the exported samples. To verify online-offline consistency provided by FeatInsight, you can record the result and compare it with online feature computation results.

![](./images/local_test_offline_samples.png)

### 4. Online Feature Service

In "Feature Services", the button besides "All Feature Services" is to create a new feature service. You can choose the features to deploy, and fill in service name and version accordingly.

![](./images/create_test_feature_service.png)

After successful creation, you can check service details, including feature list, dependent tables and lineage.

![](./images/test_feature_service_detail.png)

Lastly, in "Request Feature Service" page, we can key in test data to perform online feature calculation, and compare it with offline computation results.

![](./images/request_test_feature_service.png)

## Summary

This example demonstrates the complete process of using FeatInsight. By writing simple SQL statements, users can define features for both online and offline scenarios. By selecting different features or combining feature sets, users can quickly reuse and deploy features services. Lastly, the consistency of feature computation can be validated by comparing offline and online calculation results.

## Appendix: Advanced Functions
In addition to the basic functionalities of feature engineering, FeatInsight also provides advanced functionalities to facilitate feature development for users:

* SQL Playground: Offers debugging and execution capabilities for OpenMLDB SQL statements, allowing users to execute arbitrary SQL operations and debug SQL statements for feature extraction.
* Computed Features: Enables the direct storage of feature values obtained through external batch computation or stream processing into OpenMLDB online tables. Users can then access and manipulate feature data in online tables.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 67031d2

Please sign in to comment.