Skip to content

Commit

Permalink
chore: Updating documentation for entity's join key (#2451)
Browse files Browse the repository at this point in the history
Signed-off-by: pyalex <[email protected]>
  • Loading branch information
pyalex committed Mar 26, 2022
1 parent 356788a commit c65865e
Show file tree
Hide file tree
Showing 6 changed files with 27 additions and 8 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/master_only.yml
Original file line number Diff line number Diff line change
Expand Up @@ -205,4 +205,4 @@ jobs:
make push-${{ matrix.component }}-docker REGISTRY=${REGISTRY} VERSION=${GITHUB_SHA}
docker tag ${REGISTRY}/${{ matrix.component }}:${GITHUB_SHA} ${REGISTRY}/${{ matrix.component }}:develop
docker push ${REGISTRY}/${{ matrix.component }}:develop
docker push ${REGISTRY}/${{ matrix.component }}:develop
2 changes: 1 addition & 1 deletion docs/getting-started/concepts/entity.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ An entity is a collection of semantically related features. Users define entitie
driver = Entity(name='driver', value_type=ValueType.STRING, join_key='driver_id')
```

Entities are typically defined as part of feature views. Entities are used to identify the primary key on which feature values should be stored and retrieved. These keys are used during the lookup of feature values from the online store and the join process in point-in-time joins. It is possible to define composite entities \(more than one entity object\) in a feature view. It is also possible for feature views to have zero entities. See [feature view](feature-view.md) for more details.
Entities are typically defined as part of feature views. Entity name is used to reference the entity from a feature view definition and join key is used to identify the physical primary key on which feature values should be stored and retrieved. These keys are used during the lookup of feature values from the online store and the join process in point-in-time joins. It is possible to define composite entities \(more than one entity object\) in a feature view. It is also possible for feature views to have zero entities. See [feature view](feature-view.md) for more details.

Entities should be reused across feature views.

Expand Down
5 changes: 4 additions & 1 deletion docs/getting-started/concepts/feature-retrieval.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,10 @@ online_features = fs.get_online_features(
'driver_locations:lon',
'drivers_activity:trips_today'
],
entity_rows=[{'driver': 'driver_1001'}]
entity_rows=[
# {join_key: entity_value}
{'driver': 'driver_1001'}
]
)
```

Expand Down
18 changes: 14 additions & 4 deletions docs/getting-started/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,14 +95,16 @@ driver_hourly_stats = FileSource(

# Define an entity for the driver. You can think of entity as a primary key used to
# fetch features.
driver = Entity(name="driver_id", value_type=ValueType.INT64, description="driver id",)
# Entity has a name used for later reference (in a feature view, eg)
# and join_key to identify physical field name used in storages
driver = Entity(name="driver", value_type=ValueType.INT64, join_key="driver_id", description="driver id",)

# Our parquet files contain sample data that includes a driver_id column, timestamps and
# three feature column. Here we define a Feature View that will allow us to serve this
# data to our model online.
driver_hourly_stats_view = FeatureView(
name="driver_hourly_stats",
entities=["driver_id"],
entities=["driver"], # reference entity by name
ttl=Duration(seconds=86400 * 1),
features=[
Feature(name="conv_rate", dtype=ValueType.FLOAT),
Expand Down Expand Up @@ -162,14 +164,16 @@ driver_hourly_stats = FileSource(

# Define an entity for the driver. You can think of entity as a primary key used to
# fetch features.
driver = Entity(name="driver_id", value_type=ValueType.INT64, description="driver id",)
# Entity has a name used for later reference (in a feature view, eg)
# and join_key to identify physical field name used in storages
driver = Entity(name="driver", value_type=ValueType.INT64, join_key="driver_id", description="driver id",)

# Our parquet files contain sample data that includes a driver_id column, timestamps and
# three feature column. Here we define a Feature View that will allow us to serve this
# data to our model online.
driver_hourly_stats_view = FeatureView(
name="driver_hourly_stats",
entities=["driver_id"],
entities=["driver"], # reference entity by name
ttl=Duration(seconds=86400 * 1),
features=[
Feature(name="conv_rate", dtype=ValueType.FLOAT),
Expand Down Expand Up @@ -213,8 +217,13 @@ from feast import FeatureStore
# The entity dataframe is the dataframe we want to enrich with feature values
entity_df = pd.DataFrame.from_dict(
{
# entity's join key -> entity values
"driver_id": [1001, 1002, 1003],

# label name -> label values
"label_driver_reported_satisfaction": [1, 5, 3],

# "event_timestamp" (reserved key) -> timestamps
"event_timestamp": [
datetime.now() - timedelta(minutes=11),
datetime.now() - timedelta(minutes=36),
Expand Down Expand Up @@ -320,6 +329,7 @@ feature_vector = store.get_online_features(
"driver_hourly_stats:avg_daily_trips",
],
entity_rows=[
# {join_key: entity_value}
{"driver_id": 1004},
{"driver_id": 1005},
],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ fs = FeatureStore(repo_path="path/to/feature/repo")
online_features = fs.get_online_features(
features=features,
entity_rows=[
# {join_key: entity_value, ...}
{"driver_id": 1001},
{"driver_id": 1002}]
).to_dict()
Expand Down
7 changes: 6 additions & 1 deletion docs/tutorials/driver-stats-on-snowflake.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,12 @@ fs.materialize_incremental(end_date=datetime.now())
{% code title="test.py" %}
```python
online_features = fs.get_online_features(
features=features, entity_rows=[{"driver_id": 1001}, {"driver_id": 1002}],
features=features,
entity_rows=[
# {join_key: entity_value}
{"driver_id": 1001},
{"driver_id": 1002}
],
).to_dict()
```
{% endcode %}

0 comments on commit c65865e

Please sign in to comment.