diff --git a/examples/README.md b/examples/README.md index 91799864aa..e796b5000e 100644 --- a/examples/README.md +++ b/examples/README.md @@ -14,3 +14,5 @@ 7. **[Podman/Podman Compose_local](https://github.com/feast-dev/feast/tree/master/examples/podman_local)**: Demonstrates how to deploy Feast remote server components using Podman Compose locally. +8. **[RHOAI Feast Demo](https://github.com/feast-dev/feast/tree/master/examples/rhoai-quickstart)**: Showcases Feast's core functionality using a Jupyter notebook, including fetching online feature data from a remote server and retrieving metadata from a remote registry. + diff --git a/examples/rhoai-quickstart/README.md b/examples/rhoai-quickstart/README.md new file mode 100644 index 0000000000..a8ac587968 --- /dev/null +++ b/examples/rhoai-quickstart/README.md @@ -0,0 +1,45 @@ +# Quickstart: Running Feast example + +This quickstart guide will walk you through setting up and using [Feast](https://feast.dev) as a feature store on Red Hat OpenShift AI. By the end of this tutorial, you’ll have the environment configured, sample data loaded, and features retrieved using Feast objects. + +## Prerequisites +This example uses Jupyter Notebook to showcase Feast's capabilities. You'll need an environment where Jupyter Notebook can be executed. + +You have two options for setting up the runtime environment: +1. **Running Jupyter on your local machine**: If you're new to Jupyter, refer to the official documentation [here](https://docs.jupyter.org/en/latest/running.html). +2. **Running Jupyter on Red Hat OpenShift AI (RHOAI)**: If you'd prefer to run Jupyter on the RHOAI platform, follow the instructions below. + +## Using the Red Hat OpenShift AI (RHOAI) Platform +You can execute the Jupyter notebook directly on the RHOAI platform. If you don't have an existing RHOAI cluster, you can try this Feast example in the developer sandbox. + +To get started, visit the [Red Hat OpenShift AI sandbox](https://www.redhat.com/en/technologies/cloud-computing/openshift/openshift-ai) and launch your environment. + +### Getting Started with RHOAI +Before proceeding, it's helpful to familiarize yourself with Red Hat OpenShift AI. If you're new to the platform, check out this [short introductory video](https://youtu.be/75WtOSpn5qU?si=uT1xZfpuJBkVP7ha) for a quick overview of its features. + +For detailed documentation on RHOAI, including how to work on data science projects, refer to the official product documentation [here](https://docs.redhat.com/en/documentation/red_hat_openshift_ai_cloud_service/1/html/working_on_data_science_projects/using-data-science-projects_projects). + +### Steps to Set Up a Workbench on the RHOAI Sandbox +Follow these brief steps to create a workbench on the RHOAI sandbox: + +1. Navigate to your project (namespace) → **Dev**. +2. Go to the **Workbenches** tab. +3. Select **Create Workbench** and provide the necessary details. +4. Click **Create**. + +Once your workbench is set up, you can import and run the Feast example using one of these methods: +1. **Clone the GitHub repository**: Clone the repo into your RHOAI workbench and run the Jupyter notebook. +2. **Upload files**: Upload the necessary files to your existing workbench and execute the notebook. + +## Notebook Overview +The [feast-demo-quickstart.ipynb](feast-demo-quickstart.ipynb) notebook will guide you through: + +This notebook will use Driver entity (or model) to demonstrate the feast functionalities. +You should be able to execute the same jupyter notebook in a standalone environment as well. + +- **Setting up the Feast repository**: Load sample driver data, generate training datasets, run offline inference, and ingest batch features into the online store. You'll also learn how to fetch features for inference using Feast’s `FeatureView` and `FeatureService`. + +- **Configuring a Remote Online Topology**: Set up a remote online server and client, and retrieve features in real-time using the remote online client. + +- **Configuring a Remote Registry Topology**: Set up a remote registry server and client, and retrieve Feast metadata using the remote registry client. + diff --git a/examples/rhoai-quickstart/feast-demo-quickstart.ipynb b/examples/rhoai-quickstart/feast-demo-quickstart.ipynb new file mode 100644 index 0000000000..18874e462e --- /dev/null +++ b/examples/rhoai-quickstart/feast-demo-quickstart.ipynb @@ -0,0 +1,1539 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "8aea7cf6-81de-48af-8470-f5b9bf229113", + "metadata": { + "tags": [] + }, + "source": [ + "## Installing Feast\n", + "Feast is a python dependency so we have to install it using `pip`" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "ceb60bdb-0c90-4991-91df-dc7334ff3a93", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m A new release of pip available: \u001B[0m\u001B[31;49m22.2.2\u001B[0m\u001B[39;49m -> \u001B[0m\u001B[32;49m24.2\u001B[0m\n", + "\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m To update, run: \u001B[0m\u001B[32;49mpip install --upgrade pip\u001B[0m\n", + "Note: you may need to restart the kernel to use updated packages.\n", + "feast package is installed\n" + ] + } + ], + "source": [ + "# WE MUST ENSURE PYTHON CONSISTENCY BETWEEN NOTEBOOK AND FEAST SERVERS\n", + "# LAUNCH THIS NOTEBOOK FROM A CLEAN PYTHON ENVIRONMENT >3.9\n", + "%pip install -q feast==0.40.1\n", + "# grpcio is needed as a dependency in the later section of the example to run the feast registry server.\n", + "%pip install -q grpcio\n" + ] + }, + { + "cell_type": "markdown", + "id": "9b70c61f-1e94-4354-8dcb-3a57be140d13", + "metadata": {}, + "source": [ + "## Creating and initializing Feast project" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "0fcecd34-ff0b-4528-88a5-3125187241f6", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'/opt/app-root/src/feast/examples/rhoai-quickstart'" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Displaying the current directory. We will know where the feast files will be created so that we can review them using jupyter console or explorer\n", + "%pwd" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "1ad595e5-d98a-4f9c-bead-f2d0ad274855", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Creating a new Feast repository in \u001B[1m\u001B[32m/opt/app-root/src/feast/examples/rhoai-quickstart/my_feast_project\u001B[0m.\n", + "\n" + ] + } + ], + "source": [ + "# Creating the feast repository. If there is already existing repository then removing it first.\n", + "!rm -rf my_feast_project\n", + "!feast init my_feast_project" + ] + }, + { + "cell_type": "markdown", + "id": "3beb2f85-3e28-4afa-8227-30271dd00c7f", + "metadata": {}, + "source": [ + "Above output displays where the feast repo has been created. It may differ based on the environment configuration." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "308610d4-9174-4972-a785-237c0b0477d8", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/opt/app-root/src/feast/examples/rhoai-quickstart/my_feast_project/feature_repo\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/opt/app-root/lib64/python3.9/site-packages/IPython/core/magics/osm.py:417: UserWarning: using dhist requires you to install the `pickleshare` library.\n", + " self.shell.db['dhist'] = compress_dhist(dhist)[-100:]\n" + ] + } + ], + "source": [ + "# Going to change the current directory to feature_repo so that we can execute feast CLI commands.\n", + "%cd my_feast_project/feature_repo" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "7f5212d8-cd46-4017-a762-91a4fb56e0d5", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + ".\n", + " +-- data\n", + " +-- |-- driver_stats.parquet\n", + " +-- __init__.py\n", + " +-- __pycache__\n", + " +-- |-- __init__.cpython-39.pyc\n", + " +-- |-- test_workflow.cpython-39.pyc\n", + " +-- |-- example_repo.cpython-39.pyc\n", + " +-- feature_store.yaml\n", + " +-- example_repo.py\n", + " +-- test_workflow.py\n" + ] + } + ], + "source": [ + "# Inspect the feast repo path files. Displaying folder strucuture as tree. Going to describe each file/folder purpose.\n", + "!find . | sed -e 's/[^-][^\\/]*\\// |-- /g' -e 's/|-- \\(.*\\)/+-- \\1/'" + ] + }, + { + "cell_type": "markdown", + "id": "2fa8d4c3-8f7d-4758-8048-945ebd0af53f", + "metadata": { + "tags": [] + }, + "source": [ + "Now the feast repo has been created for you. Running the `feast init` command populated the directory with an example feature store structure, complete with example data.\n", + "\n", + "We are defining an entity for the driver in the current example. You can think of an entity as a primary key used to fetch features. Rest of the example will work on the driver data. All the data is coming from the `data/driver_stats.parquet` file which will act as offline store in our example.\n", + "\n", + "Inspect the below files before going further in the current example.\n", + "\n", + "`data` contains the parquet file data used to demonstrate this example.\n", + "\n", + "`example_repo.py` file will have the code to create feast objects such as FeatureView, FeatureServices and OnDemandFeatureViews required to demonstrate this example.\n", + "[my_feast_project/feature_repo/example_repo.py](./my_feast_project/feature_repo/example_repo.py)\n", + "\n", + "`feature_store.yaml` file will have all the configurations related to feast.\n", + "[my_feast_project/feature_repo/feature_store.yaml](./my_feast_project/feature_repo/feature_store.yaml)\n", + "\n", + "`test_workflow.py` contains the python code to demonstrate run all key Feast commands, including defining, retrieving, and pushing features.\n", + "[my_feast_project/feature_repo/test_workflow.py](./my_feast_project/feature_repo/test_workflow.py)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "f0a63ea0-93fa-4d4e-b12d-6caef86478f9", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "project: my_feast_project\n", + "# By default, the registry is a file (but can be turned into a more scalable SQL-backed registry)\n", + "registry: data/registry.db\n", + "# The provider primarily specifies default offline / online stores & storing the registry in a given cloud\n", + "provider: local\n", + "online_store:\n", + " type: sqlite\n", + " path: data/online_store.db\n", + "entity_key_serialization_version: 2\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "!cat feature_store.yaml" + ] + }, + { + "cell_type": "raw", + "id": "7094b08a-24bb-4608-9182-b8d894f85e51", + "metadata": {}, + "source": [ + "The default feature store configuration uses the sqllite as registry and online store, dask as offline store. \n", + "You can find more information: https://docs.feast.dev/reference/providers/local." + ] + }, + { + "cell_type": "markdown", + "id": "cd23ce19-03fa-4353-86d3-261c7f514fd8", + "metadata": {}, + "source": [ + "File `data/driver_stats.parquet` is generated by the `feast init` command and it acts a historical information source to this example. We have defined this source in the [my_feast_project/feature_repo/example_repo.py](./my_feast_project/feature_repo/example_repo.py) file.\n", + "\n", + "```python\n", + "driver_stats_source = FileSource(\n", + " name=\"driver_hourly_stats_source\",\n", + " path=\"/opt/app-root/src/feast/examples/rhoai-quickstart/my_feast_project/feature_repo/data/driver_stats.parquet\",\n", + " timestamp_field=\"event_timestamp\",\n", + " created_timestamp_column=\"created\",\n", + ")\n", + "```\n" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "6dc4f196-d9ec-421f-9cce-16678b783ea7", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
event_timestampdriver_idconv_rateacc_rateavg_daily_tripscreated
02024-09-09 17:00:00+00:0010050.3737580.4754013542024-09-24 17:07:41.972
12024-09-09 18:00:00+00:0010050.0579710.3755695172024-09-24 17:07:41.972
22024-09-09 19:00:00+00:0010050.3838320.3232744842024-09-24 17:07:41.972
32024-09-09 20:00:00+00:0010050.4033900.5706646342024-09-24 17:07:41.972
42024-09-09 21:00:00+00:0010050.5367410.6451071282024-09-24 17:07:41.972
.....................
18022024-09-24 15:00:00+00:0010010.5340480.6216125112024-09-24 17:07:41.972
18032024-09-24 16:00:00+00:0010010.7762480.1203843112024-09-24 17:07:41.972
18042021-04-12 07:00:00+00:0010010.0588210.1097815812024-09-24 17:07:41.972
18052024-09-17 05:00:00+00:0010030.2978630.940503132024-09-24 17:07:41.972
18062024-09-17 05:00:00+00:0010030.2978630.940503132024-09-24 17:07:41.972
\n", + "

1807 rows × 6 columns

\n", + "
" + ], + "text/plain": [ + " event_timestamp driver_id conv_rate acc_rate \\\n", + "0 2024-09-09 17:00:00+00:00 1005 0.373758 0.475401 \n", + "1 2024-09-09 18:00:00+00:00 1005 0.057971 0.375569 \n", + "2 2024-09-09 19:00:00+00:00 1005 0.383832 0.323274 \n", + "3 2024-09-09 20:00:00+00:00 1005 0.403390 0.570664 \n", + "4 2024-09-09 21:00:00+00:00 1005 0.536741 0.645107 \n", + "... ... ... ... ... \n", + "1802 2024-09-24 15:00:00+00:00 1001 0.534048 0.621612 \n", + "1803 2024-09-24 16:00:00+00:00 1001 0.776248 0.120384 \n", + "1804 2021-04-12 07:00:00+00:00 1001 0.058821 0.109781 \n", + "1805 2024-09-17 05:00:00+00:00 1003 0.297863 0.940503 \n", + "1806 2024-09-17 05:00:00+00:00 1003 0.297863 0.940503 \n", + "\n", + " avg_daily_trips created \n", + "0 354 2024-09-24 17:07:41.972 \n", + "1 517 2024-09-24 17:07:41.972 \n", + "2 484 2024-09-24 17:07:41.972 \n", + "3 634 2024-09-24 17:07:41.972 \n", + "4 128 2024-09-24 17:07:41.972 \n", + "... ... ... \n", + "1802 511 2024-09-24 17:07:41.972 \n", + "1803 311 2024-09-24 17:07:41.972 \n", + "1804 581 2024-09-24 17:07:41.972 \n", + "1805 13 2024-09-24 17:07:41.972 \n", + "1806 13 2024-09-24 17:07:41.972 \n", + "\n", + "[1807 rows x 6 columns]" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "pd.read_parquet(\"data/driver_stats.parquet\")" + ] + }, + { + "cell_type": "markdown", + "id": "92557e68-f044-444e-a68a-31385b95092e", + "metadata": {}, + "source": [ + "You have not created any feast objects to do that you have to execute command `feast apply` on the directory where `feature_store.yaml` exists. Lets go and do that now." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "8e0a05ec-f449-4371-9d55-f30176b51b38", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Below folder is creating interference with the feast apply command so deleting it in case if it exists.\n", + "!rm -rf .ipynb_checkpoints/" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "5c6baa83-9834-46ac-82a4-cc0a9ae33080", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/opt/app-root/lib64/python3.9/site-packages/feast/feature_store.py:590: RuntimeWarning: On demand feature view is an experimental feature. This API is stable, but the functionality does not scale well for offline retrieval\n", + " warnings.warn(\n", + "09/24/2024 06:01:41 PM root WARNING: Cannot use sqlite_vec for vector search\n", + "09/24/2024 06:01:41 PM root WARNING: Cannot use sqlite_vec for vector search\n", + "09/24/2024 06:01:41 PM root WARNING: Cannot use sqlite_vec for vector search\n", + "09/24/2024 06:01:41 PM root WARNING: Cannot use sqlite_vec for vector search\n", + "Created entity \u001B[1m\u001B[32mdriver\u001B[0m\n", + "Created feature view \u001B[1m\u001B[32mdriver_hourly_stats\u001B[0m\n", + "Created feature view \u001B[1m\u001B[32mdriver_hourly_stats_fresh\u001B[0m\n", + "Created on demand feature view \u001B[1m\u001B[32mtransformed_conv_rate\u001B[0m\n", + "Created on demand feature view \u001B[1m\u001B[32mtransformed_conv_rate_fresh\u001B[0m\n", + "Created feature service \u001B[1m\u001B[32mdriver_activity_v2\u001B[0m\n", + "Created feature service \u001B[1m\u001B[32mdriver_activity_v1\u001B[0m\n", + "Created feature service \u001B[1m\u001B[32mdriver_activity_v3\u001B[0m\n", + "\n", + "09/24/2024 06:01:41 PM root WARNING: Cannot use sqlite_vec for vector search\n", + "09/24/2024 06:01:41 PM root WARNING: Cannot use sqlite_vec for vector search\n", + "Created sqlite table \u001B[1m\u001B[32mmy_feast_project_driver_hourly_stats_fresh\u001B[0m\n", + "Created sqlite table \u001B[1m\u001B[32mmy_feast_project_driver_hourly_stats\u001B[0m\n", + "\n" + ] + } + ], + "source": [ + "# this command will actual creates the feast objects mentioned in `example_repo.py`\n", + "!feast apply" + ] + }, + { + "cell_type": "markdown", + "id": "8ab04029-b33c-49d4-b107-59ed4c662fa5", + "metadata": {}, + "source": [ + "## Generating the training Data\n", + "\n", + "To train a model, we need features and labels. Often, this label data is stored separately (e.g. you have one table storing user survey results and another set of tables with feature values). Feast can help generate the features that map to these labels.\n", + "\n", + "\n", + "Feast needs a list of entities (e.g. driver ids) and timestamps. Feast will join relevant tables to create the relevant feature vectors. There are two ways to generate this list:\n", + "\n", + "* The user can query that table of labels with timestamps and pass that into Feast as an entity dataframe for training data generation.\n", + "\n", + "* The user can also query that table with a SQL query which pulls entities. See the [documentation](https://docs.feast.dev/getting-started/concepts/feature-retrieval) on feature retrieval for details\n", + "\n", + "Note: we include timestamps because we want the features for the same driver at various timestamps to be used in a model." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "7dd75fe4-4eed-4a2a-8714-e8079c4d5772", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:root:_list_feature_views will make breaking changes. Please use _list_batch_feature_views instead. _list_feature_views will behave like _list_all_feature_views in the future.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----- Feature schema -----\n", + "\n", + "\n", + "RangeIndex: 3 entries, 0 to 2\n", + "Data columns (total 10 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 driver_id 3 non-null int64 \n", + " 1 event_timestamp 3 non-null datetime64[ns, UTC]\n", + " 2 label_driver_reported_satisfaction 3 non-null int64 \n", + " 3 val_to_add 3 non-null int64 \n", + " 4 val_to_add_2 3 non-null int64 \n", + " 5 conv_rate 3 non-null float32 \n", + " 6 acc_rate 3 non-null float32 \n", + " 7 avg_daily_trips 3 non-null int32 \n", + " 8 conv_rate_plus_val1 3 non-null float64 \n", + " 9 conv_rate_plus_val2 3 non-null float64 \n", + "dtypes: datetime64[ns, UTC](1), float32(2), float64(2), int32(1), int64(4)\n", + "memory usage: 332.0 bytes\n", + "None\n", + "\n", + "----- Example features -----\n", + "\n", + " driver_id event_timestamp label_driver_reported_satisfaction \\\n", + "0 1001 2021-04-12 10:59:42+00:00 1 \n", + "1 1002 2021-04-12 08:12:10+00:00 5 \n", + "2 1003 2021-04-12 16:40:26+00:00 3 \n", + "\n", + " val_to_add val_to_add_2 conv_rate acc_rate avg_daily_trips \\\n", + "0 1 10 0.058821 0.109781 581 \n", + "1 2 20 0.806576 0.493560 809 \n", + "2 3 30 0.122915 0.132378 636 \n", + "\n", + " conv_rate_plus_val1 conv_rate_plus_val2 \n", + "0 1.058821 10.058821 \n", + "1 2.806576 20.806576 \n", + "2 3.122915 30.122915 \n" + ] + } + ], + "source": [ + "from datetime import datetime\n", + "import pandas as pd\n", + "\n", + "from feast import FeatureStore\n", + "\n", + "# Note: see https://docs.feast.dev/getting-started/concepts/feature-retrieval for \n", + "# more details on how to retrieve for all entities in the offline store instead\n", + "entity_df = pd.DataFrame.from_dict(\n", + " {\n", + " # entity's join key -> entity values\n", + " \"driver_id\": [1001, 1002, 1003],\n", + " # \"event_timestamp\" (reserved key) -> timestamps\n", + " \"event_timestamp\": [\n", + " datetime(2021, 4, 12, 10, 59, 42),\n", + " datetime(2021, 4, 12, 8, 12, 10),\n", + " datetime(2021, 4, 12, 16, 40, 26),\n", + " ],\n", + " # (optional) label name -> label values. Feast does not process these\n", + " \"label_driver_reported_satisfaction\": [1, 5, 3],\n", + " # values we're using for an on-demand transformation\n", + " \"val_to_add\": [1, 2, 3],\n", + " \"val_to_add_2\": [10, 20, 30],\n", + " }\n", + ")\n", + "\n", + "store = FeatureStore(repo_path=\".\")\n", + "\n", + "training_df = store.get_historical_features(\n", + " entity_df=entity_df,\n", + " features=[\n", + " \"driver_hourly_stats:conv_rate\",\n", + " \"driver_hourly_stats:acc_rate\",\n", + " \"driver_hourly_stats:avg_daily_trips\",\n", + " \"transformed_conv_rate:conv_rate_plus_val1\",\n", + " \"transformed_conv_rate:conv_rate_plus_val2\",\n", + " ],\n", + ").to_df()\n", + "\n", + "print(\"----- Feature schema -----\\n\")\n", + "print(training_df.info())\n", + "\n", + "print()\n", + "print(\"----- Example features -----\\n\")\n", + "print(training_df.head())" + ] + }, + { + "cell_type": "markdown", + "id": "70322893-947f-46cf-a4a8-6a49d8c799b5", + "metadata": { + "tags": [] + }, + "source": [ + "## Run offline inference (batch scoring)\n", + "To power a batch model, we primarily need to pull features with the get_historical_features call, but using the current timestamp" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "c1fccc14-9d94-4d1a-a91a-d8b82f8d8044", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:root:_list_feature_views will make breaking changes. Please use _list_batch_feature_views instead. _list_feature_views will behave like _list_all_feature_views in the future.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "----- Example features -----\n", + "\n", + " driver_id event_timestamp \\\n", + "0 1002 2024-09-24 18:01:58.027897+00:00 \n", + "1 1001 2024-09-24 18:01:58.027897+00:00 \n", + "2 1003 2024-09-24 18:01:58.027897+00:00 \n", + "\n", + " label_driver_reported_satisfaction val_to_add val_to_add_2 conv_rate \\\n", + "0 5 2 20 0.311688 \n", + "1 1 1 10 0.776248 \n", + "2 3 3 30 0.235401 \n", + "\n", + " acc_rate avg_daily_trips conv_rate_plus_val1 conv_rate_plus_val2 \n", + "0 0.991556 579 2.311688 20.311688 \n", + "1 0.120384 311 1.776248 10.776248 \n", + "2 0.644993 381 3.235401 30.235401 \n" + ] + } + ], + "source": [ + "entity_df[\"event_timestamp\"] = pd.to_datetime(\"now\", utc=True)\n", + "training_df = store.get_historical_features(\n", + " entity_df=entity_df,\n", + " features=[\n", + " \"driver_hourly_stats:conv_rate\",\n", + " \"driver_hourly_stats:acc_rate\",\n", + " \"driver_hourly_stats:avg_daily_trips\",\n", + " \"transformed_conv_rate:conv_rate_plus_val1\",\n", + " \"transformed_conv_rate:conv_rate_plus_val2\",\n", + " ],\n", + ").to_df()\n", + "\n", + "print(\"\\n----- Example features -----\\n\")\n", + "print(training_df.head())" + ] + }, + { + "cell_type": "markdown", + "id": "56bd764f-4455-4ade-b630-81fdbb2eb444", + "metadata": {}, + "source": [ + "## Ingest batch features into your online store" + ] + }, + { + "cell_type": "markdown", + "id": "05e5595e-589e-4639-b631-073183f8240d", + "metadata": {}, + "source": [ + "This command will generate the features from offline store and stores into online store. This command will call `get_historical_features` to get the data from offline store." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "b95b0b1b-d608-429d-91ac-ef96c698d9ca", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "09/24/2024 06:02:09 PM root WARNING: _list_feature_views will make breaking changes. Please use _list_batch_feature_views instead. _list_feature_views will behave like _list_all_feature_views in the future.\n", + "Materializing \u001B[1m\u001B[32m2\u001B[0m feature views to \u001B[1m\u001B[32m2024-09-24 18:02:06+00:00\u001B[0m into the \u001B[1m\u001B[32msqlite\u001B[0m online store.\n", + "\n", + "\u001B[1m\u001B[32mdriver_hourly_stats\u001B[0m from \u001B[1m\u001B[32m2024-09-23 18:02:09+00:00\u001B[0m to \u001B[1m\u001B[32m2024-09-24 18:02:06+00:00\u001B[0m:\n", + " 0%| | 0/5 [00:00), datetime.datetime(2024, 9, 24, 18, 2, 6, tzinfo=))])>,\n", + " ), datetime.datetime(2024, 9, 24, 18, 2, 6, tzinfo=))])>,\n", + " }, feature_transformation = )>,\n", + " }, feature_transformation = )>]" + ] + }, + "execution_count": 50, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Listing all feature views using remote registry client\n", + "registry_feature_store_client.list_all_feature_views(allow_cache=False)" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "id": "31889388-5efc-423c-b64e-e8e5c0b67a68", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[,\n", + " )>,\n", + " ,\n", + " ]" + ] + }, + "execution_count": 51, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Listing all feature services using remote registry client\n", + "registry_feature_store_client.list_feature_services()" + ] + }, + { + "cell_type": "markdown", + "id": "a074f5bc-1d1b-4ad1-9b08-b3aea58761e8", + "metadata": { + "tags": [] + }, + "source": [ + "## Stopping the online, registry server" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "id": "d8f353e0-3874-49f7-af82-bd79396eb6d8", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/opt/app-root/src/feast/examples/rhoai-quickstart/my_feast_project/feature_repo\n", + "1001130+ 17522 16173 0 18:03 ? 00:00:04 /opt/app-root/bin/python3.9 /opt/app-root/bin/feast serve\n", + "1001130+ 17570 17522 0 18:03 ? 00:00:00 /opt/app-root/bin/python3.9 /opt/app-root/bin/feast serve\n", + "1001130+ 17902 16173 0 18:07 ? 00:00:04 /opt/app-root/bin/python3.9 /opt/app-root/bin/feast serve_registry\n", + "1001130+ 18481 18438 0 18:15 ? 00:00:00 grep feast serve\n" + ] + } + ], + "source": [ + "%%sh\n", + "# checking if the registry server and online server process is already running.\n", + "pwd\n", + "ps -ef | grep 'feast serve'" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "id": "8c84142d-b835-440e-9177-9b8bfe00768f", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "remote online and registry server has been stopped.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[2024-09-24 18:16:00 +0000] [17522] [INFO] Handling signal: term\n", + "[2024-09-24 18:16:00 +0000] [17570] [INFO] Shutting down\n", + "[2024-09-24 18:16:00 +0000] [17570] [INFO] Error while closing socket [Errno 9] Bad file descriptor\n", + "[2024-09-24 18:16:00 +0000] [17570] [INFO] Waiting for application shutdown.\n", + "[2024-09-24 18:16:00 +0000] [17570] [INFO] Application shutdown complete.\n", + "[2024-09-24 18:16:00 +0000] [17570] [INFO] Finished server process [17570]\n", + "[2024-09-24 18:16:00 +0000] [17522] [ERROR] Worker (pid:17570) was sent SIGTERM!\n", + "[2024-09-24 18:16:00 +0000] [17522] [INFO] Shutting down: Master\n" + ] + } + ], + "source": [ + "feast_online_server_process.terminate() # Stop the remote Feast online server\n", + "feast_remote_registry_server_process.terminate() # stops the remote registry server\n", + "print(\"remote online and registry server has been stopped.\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "id": "93bf01b6-5bf1-4345-9bef-2e83edc489b7", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/opt/app-root/src/feast/examples/rhoai-quickstart/my_feast_project/feature_repo\n", + "1001130+ 18542 18499 0 18:16 ? 00:00:00 grep feast serve\n" + ] + } + ], + "source": [ + "%%sh\n", + "# checking if the registry server and online server process stopped. wait for some time until it kills.\n", + "pwd\n", + "ps -ef | grep 'feast serve'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cbe06b35-7586-4b26-a63b-29bcd7ea3037", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.9", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.16" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file