Skip to content

Commit

Permalink
Merge branch 'feast-dev:master' into release
Browse files Browse the repository at this point in the history
  • Loading branch information
brijesh-vora-sp authored Aug 23, 2024
2 parents e523c87 + 2ba93f6 commit 2dc0101
Show file tree
Hide file tree
Showing 163 changed files with 7,348 additions and 644 deletions.
2 changes: 1 addition & 1 deletion .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!-- Thanks for sending a pull request! Here are some tips for you:
1. Ensure that your code follows our code conventions: https://github.com/feast-dev/feast/blob/master/CONTRIBUTING.md#code-style--linting
1. Ensure that your code follows our code conventions: https://github.com/feast-dev/feast/blob/master/CONTRIBUTING.md#code-style-and-linting
2. Run unit tests and ensure that they are passing: https://github.com/feast-dev/feast/blob/master/CONTRIBUTING.md#unit-tests
3. If your change introduces any API changes, make sure to update the integration tests here: https://github.com/feast-dev/feast/tree/master/sdk/python/tests
4. Make sure documentation is updated for your PR!
Expand Down
2 changes: 1 addition & 1 deletion .releaserc.js
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ module.exports = {
"CHANGELOG.md",
"java/pom.xml",
"infra/charts/**/*.*",
"infra/feast-operator/**/*.*",
"infra/feast-operator/**/*",
"ui/package.json",
"sdk/python/feast/ui/package.json",
"sdk/python/feast/ui/yarn.lock"
Expand Down
17 changes: 9 additions & 8 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ ifeq ($(shell uname -s), Darwin)
OS = osx
endif
TRINO_VERSION ?= 376
PYTHON_VERSION = ${shell python --version | grep -Eo '[0-9]\.[0-9]+'}

# General

Expand All @@ -37,22 +38,22 @@ build: protos build-java build-docker
# Python SDK

install-python-ci-dependencies:
python -m piptools sync sdk/python/requirements/py$(PYTHON)-ci-requirements.txt
python -m piptools sync sdk/python/requirements/py$(PYTHON_VERSION)-ci-requirements.txt
pip install --no-deps -e .
python setup.py build_python_protos --inplace

install-python-ci-dependencies-uv:
uv pip sync --system sdk/python/requirements/py$(PYTHON)-ci-requirements.txt
uv pip sync --system sdk/python/requirements/py$(PYTHON_VERSION)-ci-requirements.txt
uv pip install --system --no-deps -e .
python setup.py build_python_protos --inplace

install-python-ci-dependencies-uv-venv:
uv pip sync sdk/python/requirements/py$(PYTHON)-ci-requirements.txt
uv pip sync sdk/python/requirements/py$(PYTHON_VERSION)-ci-requirements.txt
uv pip install --no-deps -e .
python setup.py build_python_protos --inplace

lock-python-ci-dependencies:
uv pip compile --system --no-strip-extras setup.py --extra ci --output-file sdk/python/requirements/py$(PYTHON)-ci-requirements.txt
uv pip compile --system --no-strip-extras setup.py --extra ci --output-file sdk/python/requirements/py$(PYTHON_VERSION)-ci-requirements.txt

package-protos:
cp -r ${ROOT_DIR}/protos ${ROOT_DIR}/sdk/python/feast/protos
Expand All @@ -61,11 +62,11 @@ compile-protos-python:
python setup.py build_python_protos --inplace

install-python:
python -m piptools sync sdk/python/requirements/py$(PYTHON)-requirements.txt
python -m piptools sync sdk/python/requirements/py$(PYTHON_VERSION)-requirements.txt
python setup.py develop

lock-python-dependencies:
uv pip compile --system --no-strip-extras setup.py --output-file sdk/python/requirements/py$(PYTHON)-requirements.txt
uv pip compile --system --no-strip-extras setup.py --output-file sdk/python/requirements/py$(PYTHON_VERSION)-requirements.txt

lock-python-dependencies-all:
pixi run --environment py39 --manifest-path infra/scripts/pixi/pixi.toml "uv pip compile --system --no-strip-extras setup.py --output-file sdk/python/requirements/py3.9-requirements.txt"
Expand All @@ -85,14 +86,14 @@ test-python-unit:
python -m pytest -n 8 --color=yes sdk/python/tests

test-python-integration:
python -m pytest -n 8 --integration --color=yes --durations=10 --timeout=1200 --timeout_method=thread \
python -m pytest -n 4 --integration --color=yes --durations=10 --timeout=1200 --timeout_method=thread \
-k "(not snowflake or not test_historical_features_main)" \
sdk/python/tests

test-python-integration-local:
FEAST_IS_LOCAL_TEST=True \
FEAST_LOCAL_ONLINE_CONTAINER=True \
python -m pytest -n 8 --color=yes --integration --durations=5 --dist loadgroup \
python -m pytest -n 4 --color=yes --integration --durations=10 --timeout=1200 --timeout_method=thread --dist loadgroup \
-k "not test_lambda_materialization and not test_snowflake_materialization" \
sdk/python/tests

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
[![GitHub Release](https://img.shields.io/github/v/release/feast-dev/feast.svg?style=flat&sort=semver&color=blue)](https://github.com/feast-dev/feast/releases)

## Join us on Slack!
👋👋👋 [Come say hi on Slack!](https://join.slack.com/t/feastopensource/signup)
👋👋👋 [Come say hi on Slack!](https://communityinviter.com/apps/feastopensource/feast-the-open-source-feature-store)

## Overview

Expand Down Expand Up @@ -230,4 +230,4 @@ Thanks goes to these incredible people:

<a href="https://github.com/feast-dev/feast/graphs/contributors">
<img src="https://contrib.rocks/image?repo=feast-dev/feast" />
</a>
</a>
15 changes: 15 additions & 0 deletions community/ADOPTERS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Adopters of Feast

Below are the adopters of Feast. If you are using Feast please add
yourself into the following list by a pull request. Please keep the list in
alphabetical order.

| Organization | Contact | GitHub Username |
| ------------ | ------- | ------- |
| Affirm | Francisco Javier Arceo | franciscojavierarceo |
| Bank of Georgia | Tornike Gurgenidze | tokoko |
| Get Ground | Zhiling Chen | zhilingc |
| Gojek | Pradithya Aria Pura | pradithya |
| Twitter | David Liu | mavysavydav|
| Shopify | Matt Delacour | MattDelac |
| Snowflake | Miles Adkins | sfc-gh-madkins |
24 changes: 16 additions & 8 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,28 +8,31 @@
## Getting started

* [Quickstart](getting-started/quickstart.md)
* [Architecture](getting-started/architecture/README.md)
* [Overview](getting-started/architecture/overview.md)
* [Language](getting-started/architecture/language.md)
* [Push vs Pull Model](getting-started/architecture/push-vs-pull-model.md)
* [Write Patterns](getting-started/architecture/write-patterns.md)
* [Feature Transformation](getting-started/architecture/feature-transformation.md)
* [Feature Serving and Model Inference](getting-started/architecture/model-inference.md)
* [Role-Based Access Control (RBAC)](getting-started/architecture/rbac.md)
* [Concepts](getting-started/concepts/README.md)
* [Overview](getting-started/concepts/overview.md)
* [Data ingestion](getting-started/concepts/data-ingestion.md)
* [Entity](getting-started/concepts/entity.md)
* [Feature view](getting-started/concepts/feature-view.md)
* [Feature retrieval](getting-started/concepts/feature-retrieval.md)
* [Point-in-time joins](getting-started/concepts/point-in-time-joins.md)
* [Registry](getting-started/concepts/registry.md)
* [Permission](getting-started/concepts/permission.md)
* [\[Alpha\] Saved dataset](getting-started/concepts/dataset.md)
* [Architecture](getting-started/architecture/README.md)
* [Overview](getting-started/architecture/overview.md)
* [Language](getting-started/architecture/language.md)
* [Push vs Pull Model](getting-started/architecture/push-vs-pull-model.md)
* [Write Patterns](getting-started/architecture/write-patterns.md)
* [Feature Transformation](getting-started/architecture/feature-transformation.md)
* [Components](getting-started/components/README.md)
* [Overview](getting-started/components/overview.md)
* [Registry](getting-started/components/registry.md)
* [Offline store](getting-started/components/offline-store.md)
* [Online store](getting-started/components/online-store.md)
* [Batch Materialization Engine](getting-started/components/batch-materialization-engine.md)
* [Provider](getting-started/components/provider.md)
* [Authorization Manager](getting-started/components/authz_manager.md)
* [Third party integrations](getting-started/third-party-integrations.md)
* [FAQ](getting-started/faq.md)

Expand All @@ -41,7 +44,6 @@
* [Real-time credit scoring on AWS](tutorials/tutorials-overview/real-time-credit-scoring-on-aws.md)
* [Driver stats on Snowflake](tutorials/tutorials-overview/driver-stats-on-snowflake.md)
* [Validating historical features with Great Expectations](tutorials/validating-historical-features.md)
* [Using Scalable Registry](tutorials/using-scalable-registry.md)
* [Building streaming features](tutorials/building-streaming-features.md)

## How-to Guides
Expand Down Expand Up @@ -110,6 +112,12 @@
* [Hazelcast (contrib)](reference/online-stores/hazelcast.md)
* [ScyllaDB (contrib)](reference/online-stores/scylladb.md)
* [SingleStore (contrib)](reference/online-stores/singlestore.md)
* [Registries](reference/registries/README.md)
* [Local](reference/registries/local.md)
* [S3](reference/registries/s3.md)
* [GCS](reference/registries/gcs.md)
* [SQL](reference/registries/sql.md)
* [Snowflake](reference/registries/snowflake.md)
* [Providers](reference/providers/README.md)
* [Local](reference/providers/local.md)
* [Google Cloud Platform](reference/providers/google-cloud-platform.md)
Expand Down
8 changes: 8 additions & 0 deletions docs/getting-started/architecture/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,11 @@
{% content-ref url="feature-transformation.md" %}
[feature-transformation.md](feature-transformation.md)
{% endcontent-ref %}

{% content-ref url="model-inference.md" %}
[model-inference.md](model-inference.md)
{% endcontent-ref %}

{% content-ref url="rbac.md" %}
[rbac.md](rbac.md)
{% endcontent-ref %}
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
A *feature transformation* is a function that takes some set of input data and
returns some set of output data. Feature transformations can happen on either raw data or derived data.

## Feature Transformation Engines
Feature transformations can be executed by three types of "transformation engines":

1. The Feast Feature Server
Expand Down
4 changes: 2 additions & 2 deletions docs/getting-started/architecture/language.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Python: The Language of Production Machine Learning

Use Python to serve your features online.
Use Python to serve your features.


## Why should you use Python to Serve features for Machine Learning?
Python has emerged as the primary language for machine learning, and this extends to feature serving and there are five main reasons Feast recommends using a microservice in Feast.
Python has emerged as the primary language for machine learning, and this extends to feature serving and there are five main reasons Feast recommends using a microservice written in Python.

## 1. Python is the language of Machine Learning

Expand Down
97 changes: 97 additions & 0 deletions docs/getting-started/architecture/model-inference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Feature Serving and Model Inference

Production machine learning systems can choose from four approaches to serving machine learning predictions (the output
of model inference):
1. Online model inference with online features
2. Offline mode inference without online features
3. Online model inference with online features and cached predictions
4. Online model inference without features

*Note: online features can be sourced from batch, streaming, or request data sources.*

These three approaches have different tradeoffs but, in general, have significant implementation differences.

## 1. Online Model Inference with Online Features
Online model inference with online features is a powerful approach to serving data-driven machine learning applications.
This requires a feature store to serve online features and a model server to serve model predictions (e.g., KServe).
This approach is particularly useful for applications where request-time data is required to run inference.
```python
features = store.get_online_features(
feature_refs=[
"user_data:click_through_rate",
"user_data:number_of_clicks",
"user_data:average_page_duration",
],
entity_rows=[{"user_id": 1}],
)
model_predictions = model_server.predict(features)
```

## 2. Offline Model Inference without Online Features
Typically, Machine Learning teams find serving precomputed model predictions to be the most straightforward to implement.
This approach simply treats the model predictions as a feature and serves them from the feature store using the standard
Feast sdk. These model predictions are typically generated through some batch process where the model scores are precomputed.
As a concrete example, the batch process can be as simple as a script that runs model inference locally for a set of users that
can output a CSV. This output file could be used for materialization so that the model could be served online as shown in the
code below.
```python
model_predictions = store.get_online_features(
feature_refs=[
"user_data:model_predictions",
],
entity_rows=[{"user_id": 1}],
)
```
Notice that the model server is not involved in this approach. Instead, the model predictions are precomputed and
materialized to the online store.

While this approach can lead to quick impact for different business use cases, it suffers from stale data as well
as only serving users/entities that were available at the time of the batch computation. In some cases, this tradeoff
may be tolerable.

## 3. Online Model Inference with Online Features and Cached Predictions
This approach is the most sophisticated where inference is optimized for low-latency by caching predictions and running
model inference when data producers write features to the online store. This approach is particularly useful for
applications where features are coming from multiple data sources, the model is computationally expensive to run, or
latency is a significant constraint.

```python
# Client Reads
features = store.get_online_features(
feature_refs=[
"user_data:click_through_rate",
"user_data:number_of_clicks",
"user_data:average_page_duration",
"user_data:model_predictions",
],
entity_rows=[{"user_id": 1}],
)
if features.to_dict().get('user_data:model_predictions') is None:
model_predictions = model_server.predict(features)
store.write_to_online_store(feature_view_name="user_data", df=pd.DataFrame(model_predictions))
```
Note that in this case a seperate call to `write_to_online_store` is required when the underlying data changes and
predictions change along with it.

```python
# Client Writes from the Data Producer
user_data = request.POST.get('user_data')
model_predictions = model_server.predict(user_data) # assume this includes `user_data` in the Data Frame
store.write_to_online_store(feature_view_name="user_data", df=pd.DataFrame(model_predictions))
```
While this requires additional writes for every data producer, this approach will result in the lowest latency for
model inference.

## 4. Online Model Inference without Features
This approach does not require Feast. The model server can directly serve predictions without any features. This
approach is common in Large Language Models (LLMs) and other models that do not require features to make predictions.

Note that generative models using Retrieval Augmented Generation (RAG) do require features where the
[document embeddings](../../reference/alpha-vector-database.md) are treated as features, which Feast supports
(this would fall under "Online Model Inference with Online Features").

### Client Orchestration
Implicit in the code examples above is a design choice about how clients orchestrate calls to get features and run model inference.
The examples had a Feast-centric pattern because they are inputs to the model, so the sequencing is fairly obvious.
An alternative approach can be Inference-centric where a client would call an inference endpoint and the inference
service would be responsible for orchestration.
11 changes: 8 additions & 3 deletions docs/getting-started/architecture/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,16 @@ Feast's architecture is designed to be flexible and scalable. It is composed of
online store.
This allows Feast to serve features in real-time with low latency.

* Feast supports On Demand and Streaming Transformations for [feature computation](feature-transformation.md) and
will support Batch transformations in the future. For Streaming and Batch, Feast requires a separate Feature Transformation
Engine (in the batch case, this is typically your Offline Store). We are exploring adding a default streaming engine to Feast.
* Feast supports [feature transformation](feature-transformation.md) for On Demand and Streaming data sources and
will support Batch transformations in the future. For Streaming and Batch data sources, Feast requires a separate
[Feature Transformation Engine](feature-transformation.md#feature-transformation-engines) (in the batch case, this is
typically your Offline Store). We are exploring adding a default streaming engine to Feast.

* Domain expertise is recommended when integrating a data source with Feast understand the [tradeoffs from different
write patterns](write-patterns.md) to your application

* We recommend [using Python](language.md) for your Feature Store microservice. As mentioned in the document, precomputing features is the recommended optimal path to ensure low latency performance. Reducing feature serving to a lightweight database lookup is the ideal pattern, which means the marginal overhead of Python should be tolerable. Because of this we believe the pros of Python outweigh the costs, as reimplementing feature logic is undesirable. Java and Go Clients are also available for online feature retrieval.

* [Role-Based Access Control (RBAC)](rbac.md) is a security mechanism that restricts access to resources based on the roles of individual users within an organization. In the context of the Feast, RBAC ensures that only authorized users or groups can access or modify specific resources, thereby maintaining data security and operational integrity.


Binary file added docs/getting-started/architecture/rbac.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
56 changes: 56 additions & 0 deletions docs/getting-started/architecture/rbac.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Role-Based Access Control (RBAC) in Feast

## Introduction

Role-Based Access Control (RBAC) is a security mechanism that restricts access to resources based on the roles of individual users within an organization. In the context of the Feast, RBAC ensures that only authorized users or groups can access or modify specific resources, thereby maintaining data security and operational integrity.

## Functional Requirements

The RBAC implementation in Feast is designed to:

- **Assign Permissions**: Allow administrators to assign permissions for various operations and resources to users or groups based on their roles.
- **Seamless Integration**: Integrate smoothly with existing business code without requiring significant modifications.
- **Backward Compatibility**: Maintain support for non-authorized models as the default to ensure backward compatibility.

## Business Goals

The primary business goals of implementing RBAC in the Feast are:

1. **Feature Sharing**: Enable multiple teams to share the feature store while ensuring controlled access. This allows for collaborative work without compromising data security.
2. **Access Control Management**: Prevent unauthorized access to team-specific resources and spaces, governing the operations that each user or group can perform.

## Reference Architecture

Feast operates as a collection of connected services, each enforcing authorization permissions. The architecture is designed as a distributed microservices system with the following key components:

- **Service Endpoints**: These enforce authorization permissions, ensuring that only authorized requests are processed.
- **Client Integration**: Clients authenticate with feature servers by attaching authorization token to each request.
- **Service-to-Service Communication**: This is always granted.

![rbac.jpg](rbac.jpg)

## Permission Model

The RBAC system in Feast uses a permission model that defines the following concepts:

- **Resource**: An object within Feast that needs to be secured against unauthorized access.
- **Action**: A logical operation performed on a resource, such as Create, Describe, Update, Delete, Read, or write operations.
- **Policy**: A set of rules that enforce authorization decisions on resources. The default implementation uses role-based policies.



## Authorization Architecture

The authorization architecture in Feast is built with the following components:

- **Token Extractor**: Extracts the authorization token from the request header.
- **Token Parser**: Parses the token to retrieve user details.
- **Policy Enforcer**: Validates the secured endpoint against the retrieved user details.
- **Token Injector**: Adds the authorization token to each secured request header.







Loading

0 comments on commit 2dc0101

Please sign in to comment.