Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feast Integration #322

Closed
wants to merge 7 commits into from
Closed

Feast Integration #322

wants to merge 7 commits into from

Conversation

samhita-alla
Copy link
Contributor

Signed-off-by: Samhita Alla [email protected]

Feast (Feature Store) is an operational data system for managing and serving machine learning features to models in production.

Integration between Flyte and Feast can help users take their models and features from prototyping all the way to production cost-effectively and efficiently.

Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
# * - ``get_historical_features()``
# - Enrich an entity dataframe with historical feature values for either training or batch scoring.
@task
def store_offline(parquet_file: FlyteFile, repo_path: str) -> (str, str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just create it like a pre-existing task plugin? Where given a parquet file the task will automatically upload the data to Feast Offline feature store?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kumare3: What do we do about this? When I write a plugin, the input will not just be a parquet file, we'll have to take features, primary key, etc. Similarly, when retrieving the offline features, the user has to give the primary key and datetime values. The same applies to online features as well.

}
)

retrieval_job = fs.get_historical_features(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are these historical features stored?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, locally.

project: feature_engineering
registry: data/registry.db
provider: local
online_store:
    path: data/online_store.db

# One key difference between the online store and data source is that only the latest feature values are stored per entity key. No historical values are stored.
# Our dataset has two such entries with the same ``Hospital Number`` but different time stamps. Only data point with the latest timestamp is picked from the online store.
@task
def store_online(repo_path: str) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this too can be a tasktype already right?

FeastOnlineStoreTask

@task
def store_online(repo_path: str) -> str:
store = FeatureStore(repo_path=repo_path)
store.materialize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the time hard coded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can take inputs from the user. But then, it can be days, hours, minutes, ... etc. We'll have to ask the user to give four inputs: two specifying the start and end time format, and the other two specifying their respective values.

Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
@flyteorg flyteorg deleted a comment from allcontributors bot Aug 12, 2021
@eapolinario eapolinario mentioned this pull request Sep 15, 2021
6 tasks
eapolinario added a commit that referenced this pull request Sep 16, 2021
Signed-off-by: Eduardo Apolinario <[email protected]>
kumare3 pushed a commit that referenced this pull request Sep 20, 2021
* Initial version

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add venv to dockerfile

Signed-off-by: Eduardo Apolinario <[email protected]>

* Rename feast integration dir

Signed-off-by: Eduardo Apolinario <[email protected]>

* Configure minio in the image

Signed-off-by: Eduardo Apolinario <[email protected]>

* Refactoring + retrieve offline features

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove all_together

Signed-off-by: Eduardo Apolinario <[email protected]>

* Attempt to add s3 credentials to image

Signed-off-by: Eduardo Apolinario <[email protected]>

* Fix s3 endpoint

Signed-off-by: Eduardo Apolinario <[email protected]>

* custom provider

Signed-off-by: Eduardo Apolinario <[email protected]>

* Transform FeatureView prior to executing queries

Signed-off-by: Eduardo Apolinario <[email protected]>

* Set PYTHONPATH

Signed-off-by: Eduardo Apolinario <[email protected]>

* Set PYTHONPATH to multiple values

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove "custom_provider" from path

Signed-off-by: Eduardo Apolinario <[email protected]>

* Replace minio endpoint

Signed-off-by: Eduardo Apolinario <[email protected]>

* Print env vars

Signed-off-by: Eduardo Apolinario <[email protected]>

* Set FEAST_S3_ENDPOINT_URL while building feature store

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove minio credentials from image

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add aws env vars

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove mention to local provider

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove piping of registry object

Signed-off-by: Eduardo Apolinario <[email protected]>

* Create random path via FlyteContext

Signed-off-by: Eduardo Apolinario <[email protected]>

* Revert "Remove piping of registry object"

This reverts commit ccdf326.

Signed-off-by: Eduardo Apolinario <[email protected]>

* Clean up feature description and remove debugging statements

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add tasks up to `train_model`

Signed-off-by: Eduardo Apolinario <[email protected]>

* Rename workflow

Signed-off-by: Eduardo Apolinario <[email protected]>

* Comment use of custom provider

Signed-off-by: Eduardo Apolinario <[email protected]>

* Rename workflow

Signed-off-by: Eduardo Apolinario <[email protected]>

* fix error in training

Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>

* Add TODO

Signed-off-by: Eduardo Apolinario <[email protected]>

* Import feature_eng tasks directly

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add store_online task

Signed-off-by: Eduardo Apolinario <[email protected]>

* Copy remote file to a local file and replace batch_source in materialize

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add some debugging statements and fix local execution parameter

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add remaining steps to workflow

Signed-off-by: Eduardo Apolinario <[email protected]>

* Regenerate requirements files

Signed-off-by: Eduardo Apolinario <[email protected]>

* Regenerate requirements and put replacement of remote files back in custom provider

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add more logging

Signed-off-by: Eduardo Apolinario <[email protected]>

* Regenerate requirements again

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add workflow return type

Signed-off-by: Eduardo Apolinario <[email protected]>

* Include a directory prefix in the model filename

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove unused overrides in custom provider and comment use of localize_feature_view

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add type transformer

Signed-off-by: Eduardo Apolinario <[email protected]>

* Pipe _Feature_Store to all interactions with feast

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove unnecessary override in custom provider

Signed-off-by: Eduardo Apolinario <[email protected]>

* Rearrange initialization of FeatureStore for better legibility

Signed-off-by: Eduardo Apolinario <[email protected]>

* Revert "Remove unnecessary override in custom provider"

This reverts commit 2808ba0.

Signed-off-by: Eduardo Apolinario <[email protected]>

* Use create_node to enforce order

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove unused function

Signed-off-by: Eduardo Apolinario <[email protected]>

* Guard env vars behind a check

Signed-off-by: Eduardo Apolinario <[email protected]>

* Expose inputs to workflow

Signed-off-by: Eduardo Apolinario <[email protected]>

* Task to build FeatureStore

Signed-off-by: Eduardo Apolinario <[email protected]>

* Do not guard env vars behind a check

Signed-off-by: Eduardo Apolinario <[email protected]>

* Experiment with converted_df

Signed-off-by: Eduardo Apolinario <[email protected]>

* Comments

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove commented code from type transformer

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove unused portion of sandbox.config

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove TODO

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove registry parameter from local execution

Signed-off-by: Eduardo Apolinario <[email protected]>

* No need for type transformers

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove mentions to type transformers

Signed-off-by: Eduardo Apolinario <[email protected]>

* Copy README.rst from #322

Signed-off-by: Eduardo Apolinario <[email protected]>

* Step 3 of guide on adding a new integration

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove extraneous print statement and turn comments into docstrings in custom provider

Signed-off-by: Eduardo Apolinario <[email protected]>

* Comments on README.rst

Signed-off-by: Eduardo Apolinario <[email protected]>

* Fix link to feast

Signed-off-by: Eduardo Apolinario <[email protected]>

* Fix serialization of feast_integration dir

Signed-off-by: Eduardo Apolinario <[email protected]>

Co-authored-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Samhita Alla <[email protected]>
@kumare3 kumare3 closed this Sep 20, 2021
Comment on lines +72 to +94
@reference_task(
project="flytesnacks",
domain="development",
name="feast_integration.feature_eng_tasks.mean_median_imputer",
version="v1",
)
def mean_median_imputer(
dataframe: pd.DataFrame,
imputation_method: str,
) -> FlyteSchema:
...


@reference_task(
project="flytesnacks",
domain="development",
name="feast_integration.feature_eng_tasks.univariate_selection",
version="v1",
)
def univariate_selection(
dataframe: pd.DataFrame, num_features: int, data_class: str, feature_view_name: str
) -> pd.DataFrame:
...
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...

pingsutw pushed a commit to pingsutw/flyte-monorepo that referenced this pull request Apr 4, 2023
* Initial version

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add venv to dockerfile

Signed-off-by: Eduardo Apolinario <[email protected]>

* Rename feast integration dir

Signed-off-by: Eduardo Apolinario <[email protected]>

* Configure minio in the image

Signed-off-by: Eduardo Apolinario <[email protected]>

* Refactoring + retrieve offline features

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove all_together

Signed-off-by: Eduardo Apolinario <[email protected]>

* Attempt to add s3 credentials to image

Signed-off-by: Eduardo Apolinario <[email protected]>

* Fix s3 endpoint

Signed-off-by: Eduardo Apolinario <[email protected]>

* custom provider

Signed-off-by: Eduardo Apolinario <[email protected]>

* Transform FeatureView prior to executing queries

Signed-off-by: Eduardo Apolinario <[email protected]>

* Set PYTHONPATH

Signed-off-by: Eduardo Apolinario <[email protected]>

* Set PYTHONPATH to multiple values

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove "custom_provider" from path

Signed-off-by: Eduardo Apolinario <[email protected]>

* Replace minio endpoint

Signed-off-by: Eduardo Apolinario <[email protected]>

* Print env vars

Signed-off-by: Eduardo Apolinario <[email protected]>

* Set FEAST_S3_ENDPOINT_URL while building feature store

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove minio credentials from image

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add aws env vars

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove mention to local provider

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove piping of registry object

Signed-off-by: Eduardo Apolinario <[email protected]>

* Create random path via FlyteContext

Signed-off-by: Eduardo Apolinario <[email protected]>

* Revert "Remove piping of registry object"

This reverts commit ccdf3264bb5b428eb2b474d8422a21a5bb82b0b5.

Signed-off-by: Eduardo Apolinario <[email protected]>

* Clean up feature description and remove debugging statements

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add tasks up to `train_model`

Signed-off-by: Eduardo Apolinario <[email protected]>

* Rename workflow

Signed-off-by: Eduardo Apolinario <[email protected]>

* Comment use of custom provider

Signed-off-by: Eduardo Apolinario <[email protected]>

* Rename workflow

Signed-off-by: Eduardo Apolinario <[email protected]>

* fix error in training

Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>

* Add TODO

Signed-off-by: Eduardo Apolinario <[email protected]>

* Import feature_eng tasks directly

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add store_online task

Signed-off-by: Eduardo Apolinario <[email protected]>

* Copy remote file to a local file and replace batch_source in materialize

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add some debugging statements and fix local execution parameter

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add remaining steps to workflow

Signed-off-by: Eduardo Apolinario <[email protected]>

* Regenerate requirements files

Signed-off-by: Eduardo Apolinario <[email protected]>

* Regenerate requirements and put replacement of remote files back in custom provider

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add more logging

Signed-off-by: Eduardo Apolinario <[email protected]>

* Regenerate requirements again

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add workflow return type

Signed-off-by: Eduardo Apolinario <[email protected]>

* Include a directory prefix in the model filename

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove unused overrides in custom provider and comment use of localize_feature_view

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add type transformer

Signed-off-by: Eduardo Apolinario <[email protected]>

* Pipe _Feature_Store to all interactions with feast

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove unnecessary override in custom provider

Signed-off-by: Eduardo Apolinario <[email protected]>

* Rearrange initialization of FeatureStore for better legibility

Signed-off-by: Eduardo Apolinario <[email protected]>

* Revert "Remove unnecessary override in custom provider"

This reverts commit 2808ba07b2ba73f77d2f6c1a08aba0c2cdccea97.

Signed-off-by: Eduardo Apolinario <[email protected]>

* Use create_node to enforce order

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove unused function

Signed-off-by: Eduardo Apolinario <[email protected]>

* Guard env vars behind a check

Signed-off-by: Eduardo Apolinario <[email protected]>

* Expose inputs to workflow

Signed-off-by: Eduardo Apolinario <[email protected]>

* Task to build FeatureStore

Signed-off-by: Eduardo Apolinario <[email protected]>

* Do not guard env vars behind a check

Signed-off-by: Eduardo Apolinario <[email protected]>

* Experiment with converted_df

Signed-off-by: Eduardo Apolinario <[email protected]>

* Comments

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove commented code from type transformer

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove unused portion of sandbox.config

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove TODO

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove registry parameter from local execution

Signed-off-by: Eduardo Apolinario <[email protected]>

* No need for type transformers

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove mentions to type transformers

Signed-off-by: Eduardo Apolinario <[email protected]>

* Copy README.rst from flyteorg/flytesnacks#322

Signed-off-by: Eduardo Apolinario <[email protected]>

* Step 3 of guide on adding a new integration

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove extraneous print statement and turn comments into docstrings in custom provider

Signed-off-by: Eduardo Apolinario <[email protected]>

* Comments on README.rst

Signed-off-by: Eduardo Apolinario <[email protected]>

* Fix link to feast

Signed-off-by: Eduardo Apolinario <[email protected]>

* Fix serialization of feast_integration dir

Signed-off-by: Eduardo Apolinario <[email protected]>

Co-authored-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Samhita Alla <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants