Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema Inferencing should happen at apply time #1646

Merged

Conversation

mavysavydav
Copy link
Collaborator

What this PR does / why we need it:
#1645

Which issue(s) this PR fixes:

Fixes #1645

Does this PR introduce a user-facing change?:

None

@feast-ci-bot
Copy link
Collaborator

Hi @mavysavydav. Thanks for your PR.

I'm waiting for a feast-dev member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@codecov-commenter
Copy link

codecov-commenter commented Jun 13, 2021

Codecov Report

Merging #1646 (9a854bf) into master (a708915) will decrease coverage by 0.02%.
The diff coverage is 89.70%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1646      +/-   ##
==========================================
- Coverage   83.54%   83.51%   -0.03%     
==========================================
  Files          71       71              
  Lines        6173     6195      +22     
==========================================
+ Hits         5157     5174      +17     
- Misses       1016     1021       +5     
Flag Coverage Δ
integrationtests 83.43% <89.70%> (-0.03%) ⬇️
unittests 76.10% <39.70%> (-0.15%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/feast/repo_operations.py 31.70% <25.00%> (-0.48%) ⬇️
sdk/python/feast/data_source.py 78.15% <75.00%> (-0.63%) ⬇️
sdk/python/feast/feature_view.py 89.10% <90.00%> (+0.22%) ⬆️
sdk/python/feast/inference.py 93.47% <92.30%> (-1.98%) ⬇️
sdk/python/feast/errors.py 68.75% <100.00%> (+2.08%) ⬆️
sdk/python/feast/feature_store.py 93.04% <100.00%> (+0.09%) ⬆️
sdk/python/tests/test_inference.py 100.00% <100.00%> (ø)
sdk/python/tests/utils/data_source_utils.py 100.00% <100.00%> (ø)
sdk/python/feast/type_map.py 41.25% <0.00%> (-1.40%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a708915...9a854bf. Read the comment docs.

@achals
Copy link
Member

achals commented Jun 13, 2021

/ok-to-test

Copy link
Member

@achals achals left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @mavysavydav ! Left some comments about some cases that I wasn't sure are covered elsewhere in the codebase or not. Looks good otherwise!

@@ -598,7 +563,7 @@ def __init__(
self._file_options = FileOptions(file_format=file_format, file_url=file_url)

super().__init__(
event_timestamp_column or self._infer_event_timestamp_column(r"^timestamp"),
event_timestamp_column or "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this just be None instead of an empty string?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep can delete the "or ""' part. Just need to change default value for event_timestamp_column of base class (DataSource) to None instead of "". Was hesitant to change base class param since didn't know if the use of "" was intentional.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh actually, it seems like the data source base class doesn't have event_timestamp_column as optional so its type is str instead of Option[str]. The subclasses constructors use Option[str] so when passing to the supers' constructor need to handle the type checking and or "" turns the type from a Option[str] to str. I'll add comments to make it clear this is for type checking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make the event_timestamp_column optional in the base data source class? I think that would be reasonable if possible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep can do. Was thinking that not all subclasses of DataSource base class is guaranteed to have inference so maybe from an interface point of view, it's good to communicate that DataSource actually needs a event_timestamp_column.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved on slack. Achal - "I think that the subclasses can add additional assertions as per their requirements, but the base class should be more broad". Will make the change

@@ -672,8 +637,7 @@ def __init__(
self._bigquery_options = BigQueryOptions(table_ref=table_ref, query=query)

super().__init__(
event_timestamp_column
or self._infer_event_timestamp_column("TIMESTAMP|DATETIME"),
event_timestamp_column or "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment, any reason to use an empty string instead of None?

sdk/python/feast/inference.py Show resolved Hide resolved
sdk/python/feast/inference.py Outdated Show resolved Hide resolved
sdk/python/feast/feature_view.py Show resolved Hide resolved
infer_event_timestamp_column_for_data_sources(
[view.input for view in repo.feature_views]
)
for view in repo.feature_views:
view.infer_features_from_input_source()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an ordering requirement that these be put up here? I think that in lines 162-166 below, we validate if the data sources are even supported (and in #1627, whether the BQ tables even exist). I'm happy to change the order in #1627 later, but wanted to ask now about whether order matters.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call

@mavysavydav
Copy link
Collaborator Author

/retest

@mavysavydav mavysavydav requested a review from achals June 16, 2021 00:13
@mavysavydav
Copy link
Collaborator Author

mavysavydav commented Jun 16, 2021

Unsure why telemetry is failing @achals

@achals
Copy link
Member

achals commented Jun 18, 2021

/retest

@achals
Copy link
Member

achals commented Jun 18, 2021

Thanks for the patience @mavysavydav! I only have one suggestion, and I think this will be good to merge.

@mavysavydav mavysavydav force-pushed the inferenceImprovementsAndRefactor branch from c72f013 to 9a854bf Compare June 18, 2021 20:05
Copy link
Member

@achals achals left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: achals, mavysavydav

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@achals achals changed the title Inferencing improvements and refactor (e.g. should happen at apply time) Schema Inferencing improvements should happen at apply time Jun 18, 2021
@achals achals changed the title Schema Inferencing improvements should happen at apply time Schema Inferencing should happen at apply time Jun 18, 2021
@mavysavydav
Copy link
Collaborator Author

/kind bug

@feast-ci-bot feast-ci-bot merged commit c50a36e into feast-dev:master Jun 18, 2021
Mwad22 pushed a commit to Mwad22/feast that referenced this pull request Jul 7, 2021
* wip1

Signed-off-by: David Y Liu <[email protected]>

* just need to do clean up

Signed-off-by: David Y Liu <[email protected]>

* linted

Signed-off-by: David Y Liu <[email protected]>

* improve test coverage

Signed-off-by: David Y Liu <[email protected]>

* changed placement of inference methods in repo_operation apply_total

Signed-off-by: David Y Liu <[email protected]>

* updated inference method name + changed to void return since it updates in place

Signed-off-by: David Y Liu <[email protected]>

* fixed integration test and added comments

Signed-off-by: David Y Liu <[email protected]>

* Made DataSource event_timestamp_column optional

Signed-off-by: David Y Liu <[email protected]>
Signed-off-by: Mwad22 <[email protected]>
feast-ci-bot pushed a commit that referenced this pull request Jul 8, 2021
…on to include feature view name in feature naming. (#1641)

* test

Signed-off-by: David Y Liu <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* refactored existing tests to test full_feature_names feature on data retreival, added new tests also.

Signed-off-by: Mwad22 <[email protected]>

* removed full_feature_names usage from quickstart and README to have more simple examples. Resolved failing tests.

Signed-off-by: Mwad22 <[email protected]>

* Update CHANGELOG for Feast v0.10.8

Signed-off-by: Mwad22 <[email protected]>

* GitBook: [master] 2 pages modified

Signed-off-by: Mwad22 <[email protected]>

* Schema Inferencing should happen at apply time (#1646)

* wip1

Signed-off-by: David Y Liu <[email protected]>

* just need to do clean up

Signed-off-by: David Y Liu <[email protected]>

* linted

Signed-off-by: David Y Liu <[email protected]>

* improve test coverage

Signed-off-by: David Y Liu <[email protected]>

* changed placement of inference methods in repo_operation apply_total

Signed-off-by: David Y Liu <[email protected]>

* updated inference method name + changed to void return since it updates in place

Signed-off-by: David Y Liu <[email protected]>

* fixed integration test and added comments

Signed-off-by: David Y Liu <[email protected]>

* Made DataSource event_timestamp_column optional

Signed-off-by: David Y Liu <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* GitBook: [master] 80 pages modified

Signed-off-by: Mwad22 <[email protected]>

* GitBook: [master] 80 pages modified

Signed-off-by: Mwad22 <[email protected]>

* Provide descriptive error on invalid table reference (#1627)

* Initial commit to catch nonexistent table

Signed-off-by: Cody Lin <[email protected]>
Signed-off-by: Cody Lin <[email protected]>

* simplify nonexistent BQ table test

Signed-off-by: Cody Lin <[email protected]>

* clean up table_exists exception

Signed-off-by: Cody Lin <[email protected]>

* remove unneeded variable

Signed-off-by: Cody Lin <[email protected]>

* function name change to _assert_table_exists

Signed-off-by: Cody Lin <[email protected]>

* Initial commit to catch nonexistent table

Signed-off-by: Cody Lin <[email protected]>
Signed-off-by: Cody Lin <[email protected]>

* simplify nonexistent BQ table test

Signed-off-by: Cody Lin <[email protected]>

* clean up table_exists exception

Signed-off-by: Cody Lin <[email protected]>

* function name change to _assert_table_exists

Signed-off-by: Cody Lin <[email protected]>

* fix lint errors and rebase

Signed-off-by: Cody Lin <[email protected]>

* Fix get_table(None) error

Signed-off-by: Cody Lin <[email protected]>

* custom exception for both missing file and BQ source

Signed-off-by: Cody Lin <[email protected]>

* revert FileSource checks

Signed-off-by: Cody Lin <[email protected]>

* Use DataSourceNotFoundException instead of subclassing

Signed-off-by: Cody Lin <[email protected]>

* Moved assert_table_exists out of the BQ constructor to apply_total

Signed-off-by: Cody Lin <[email protected]>

* rename test and test asset

Signed-off-by: Cody Lin <[email protected]>

* move validate logic back to data_source

Signed-off-by: Cody Lin <[email protected]>

* fixed tests

Signed-off-by: Cody Lin <[email protected]>

* Set pytest.integration for tests that access BQ

Signed-off-by: Cody Lin <[email protected]>

* Import pytest in failed test files

Signed-off-by: Cody Lin <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* Refactor OnlineStoreConfig classes into owning modules (#1649)

* Refactor OnlineStoreConfig classes into owning modules

Signed-off-by: Achal Shah <[email protected]>

* make format

Signed-off-by: Achal Shah <[email protected]>

* Move redis too

Signed-off-by: Achal Shah <[email protected]>

* update test_telemetery

Signed-off-by: Achal Shah <[email protected]>

* add a create_repo_config method that should be called instead of RepoConfig ctor directly

Signed-off-by: Achal Shah <[email protected]>

* fix the table reference in repo_operations

Signed-off-by: Achal Shah <[email protected]>

* reuse create_repo_config

Signed-off-by: Achal Shah <[email protected]>

Remove redis provider reference

* CR comments

Signed-off-by: Achal Shah <[email protected]>

* Remove create_repo_config in favor of __init__

Signed-off-by: Achal Shah <[email protected]>

* make format

Signed-off-by: Achal Shah <[email protected]>

* Remove print statement

Signed-off-by: Achal Shah <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* Possibility to specify a project for BigQuery queries (#1656)

Signed-off-by: Matt Delacour <[email protected]>

Co-authored-by: Achal Shah <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* Refactor OfflineStoreConfig classes into their owning modules (#1657)

* Refactor OfflineStoreConfig classes into their owning modules

Signed-off-by: Achal Shah <[email protected]>

* Fix error string

Signed-off-by: Achal Shah <[email protected]>

* Generic error class

Signed-off-by: Achal Shah <[email protected]>

* Merge conflicts

Signed-off-by: Achal Shah <[email protected]>

* make the store type work, and add a test that uses the fully qualified name of the OnlineStore

Signed-off-by: Achal Shah <[email protected]>

* Address comments from previous PR

Signed-off-by: Achal Shah <[email protected]>

* CR updates

Signed-off-by: Achal Shah <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* Run python unit tests in parallel (#1652)

Signed-off-by: Achal Shah <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* Rename telemetry to usage (#1660)

* Rename telemetry to usage

Signed-off-by: Tsotne Tabidze <[email protected]>

* Update docs

Signed-off-by: Tsotne Tabidze <[email protected]>

* Update .prow and infra

Signed-off-by: Tsotne Tabidze <[email protected]>

* Rename file

Signed-off-by: Tsotne Tabidze <[email protected]>

* Change url

Signed-off-by: Tsotne Tabidze <[email protected]>

* Re-add telemetry.md for backwards-compatibility

Signed-off-by: Tsotne Tabidze <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* resolved final comments on PR (variable renaming, refactor tests)

Signed-off-by: Mwad22 <[email protected]>

* reformatted after merge conflict

Signed-off-by: Mwad22 <[email protected]>

* Update CHANGELOG for Feast v0.11.0

Signed-off-by: Willem Pienaar <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* Update charts README (#1659)

Adding feast jupyter link to it.

+ Fix the helm 'feast-serving' name in aws/azure terraform.

Signed-off-by: szalai1 <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* Added Redis to list of online stores for local provider in providers reference doc. (#1668)

Signed-off-by: Nel Swanepoel <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* Grouped inferencing statements together in apply methods for easier readability (#1667)

* grouped inferencing statements together

Signed-off-by: David Y Liu <[email protected]>

* update in testing

Signed-off-by: David Y Liu <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* Add RedshiftDataSource (#1669)

* Add RedshiftDataSource

Signed-off-by: Tsotne Tabidze <[email protected]>

* Call parent __init__ first

Signed-off-by: Tsotne Tabidze <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* Provide the user with more options for setting the to_bigquery config (#1661)

* Provide more options for to_bigquery config

Signed-off-by: Cody Lin <[email protected]>

* Fix default job_config when none; remove excessive testing

Signed-off-by: Cody Lin <[email protected]>

* Add param type and docstring

Signed-off-by: Cody Lin <[email protected]>

* add docstrings and typing

Signed-off-by: Cody Lin <[email protected]>

* Apply docstring suggestions from code review

Co-authored-by: Willem Pienaar <[email protected]>
Signed-off-by: Cody Lin <[email protected]>

Co-authored-by: Willem Pienaar <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* Add streaming sources to the FeatureView API (#1664)

* Add a streaming source to the FeatureView API

This diff only updates the API. It is currently up to the providers to actually use this information to spin up resources to consume events from the stream sources.

Signed-off-by: Achal Shah <[email protected]>

* remove stuff from rebase

Signed-off-by: Achal Shah <[email protected]>

* make format

Signed-off-by: Achal Shah <[email protected]>

* Update protos

Signed-off-by: Achal Shah <[email protected]>

* lint

Signed-off-by: Achal Shah <[email protected]>

* format

Signed-off-by: Achal Shah <[email protected]>

* CR

Signed-off-by: Achal Shah <[email protected]>

* fix test

Signed-off-by: Achal Shah <[email protected]>

* lint

Signed-off-by: Achal Shah <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* Add to_table() to RetrievalJob object (#1663)

* Add notion of OfflineJob

Signed-off-by: Matt Delacour <[email protected]>

* Use RetrievalJob instead of creating a new OfflineJob object

Signed-off-by: Matt Delacour <[email protected]>

* Add to_table() in integration tests

Signed-off-by: Matt Delacour <[email protected]>

Co-authored-by: Tsotne Tabidze <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* Rename to_table to to_arrow (#1671)

Signed-off-by: Matt Delacour <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* Cancel BigQuery job if timeout hits (#1672)

* Cancel BigQuery job if timedout hits

Signed-off-by: Matt Delacour <[email protected]>

* Fix typo

Signed-off-by: Matt Delacour <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* Fix Feature References example (#1674)

Fix Feature References example by passing `entity_rows` to `get_online_features()`

Signed-off-by: Mwad22 <[email protected]>

* Allow strings for online/offline store instead of dicts (#1673)

Signed-off-by: Achal Shah <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* Remove default list from the FeatureView constructor (#1679)

Signed-off-by: Achal Shah <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* made changes requested by @tsotnet

Signed-off-by: Mwad22 <[email protected]>

* Fix unit tests that got broken by Pandas 1.3.0 release (#1683)

Signed-off-by: Tsotne Tabidze <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* Add support for DynamoDB and S3 registry (#1483)

* Add support for DynamoDB and S3 registry

Signed-off-by: lblokhin <[email protected]>

* rcu and wcu as a parameter of dynamodb online store

Signed-off-by: lblokhin <[email protected]>

* fix linter

Signed-off-by: lblokhin <[email protected]>

* aws dependency to extras

Signed-off-by: lblokhin <[email protected]>

* FEAST_S3_ENDPOINT_URL

Signed-off-by: lblokhin <[email protected]>

* tests

Signed-off-by: lblokhin <[email protected]>

* fix signature, after merge

Signed-off-by: lblokhin <[email protected]>

* aws default region name configurable

Signed-off-by: lblokhin <[email protected]>

* add offlinestore config type to test

Signed-off-by: lblokhin <[email protected]>

* review changes

Signed-off-by: lblokhin <[email protected]>

* review requested changes

Signed-off-by: lblokhin <[email protected]>

* integration test for Dynamo

Signed-off-by: lblokhin <[email protected]>

* change the rest of table_name to table_instance (where table_name is actually an instance of DynamoDB Table object)

Signed-off-by: lblokhin <[email protected]>

* fix DynamoDBOnlineStore commit

Signed-off-by: lblokhin <[email protected]>

* move client to _initialize_dynamodb

Signed-off-by: lblokhin <[email protected]>

* rename document_id to entity_id and Row to entity_id

Signed-off-by: lblokhin <[email protected]>

* The default value is None

Signed-off-by: lblokhin <[email protected]>

* Remove Datastore from the docstring.

Signed-off-by: lblokhin <[email protected]>

* get rid of the return call from S3RegistryStore

Signed-off-by: lblokhin <[email protected]>

* merge two exceptions

Signed-off-by: lblokhin <[email protected]>

* For ci requirement

Signed-off-by: lblokhin <[email protected]>

* remove configuration from test

Signed-off-by: lblokhin <[email protected]>

* feast-integration-tests for tests

Signed-off-by: lblokhin <[email protected]>

* change test path

Signed-off-by: lblokhin <[email protected]>

* add fixture feature_store_with_s3_registry to test

Signed-off-by: lblokhin <[email protected]>

* region required

Signed-off-by: lblokhin <[email protected]>

* Address the rest of the comments

Signed-off-by: Tsotne Tabidze <[email protected]>

* Update to_table to to_arrow

Signed-off-by: Tsotne Tabidze <[email protected]>

Co-authored-by: Tsotne Tabidze <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* Parallelize integration tests (#1684)

* Parallelize integration tests

Signed-off-by: Tsotne Tabidze <[email protected]>

* Update the usage flag

Signed-off-by: Tsotne Tabidze <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* BQ exception should be raised first before we check the timedout (#1675)

Signed-off-by: Matt Delacour <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* Update sdk/python/feast/infra/provider.py

Co-authored-by: Willem Pienaar <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* Update sdk/python/feast/feature_store.py

Co-authored-by: Willem Pienaar <[email protected]>
Signed-off-by: Mwad22 <[email protected]>

* made error logic/messages more descriptive

Signed-off-by: Mwad22 <[email protected]>

* made error logic/messages more descriptive.

Signed-off-by: Mwad22 <[email protected]>

* Simplified error messages

Signed-off-by: Mwad22 <[email protected]>

* ran formatter, issue in errors.py

Signed-off-by: Mwad22 <[email protected]>

* python linter issues resolved

Signed-off-by: Mwad22 <[email protected]>

* removed unnecessary default assignment in get_historical_features. default now set only in feature_store.py

Signed-off-by: Mwad22 <[email protected]>

* added error message assertion for feature name collisions, and other nitpick changes

Signed-off-by: Mwad22 <[email protected]>

Co-authored-by: David Y Liu <[email protected]>
Co-authored-by: Tsotne Tabidze <[email protected]>
Co-authored-by: Achal Shah <[email protected]>
Co-authored-by: David Y Liu <[email protected]>
Co-authored-by: Willem Pienaar <[email protected]>
Co-authored-by: codyjlin <[email protected]>
Co-authored-by: Matt Delacour <[email protected]>
Co-authored-by: Willem Pienaar <[email protected]>
Co-authored-by: Peter Szalai <[email protected]>
Co-authored-by: Nel Swanepoel <[email protected]>
Co-authored-by: Willem Pienaar <[email protected]>
Co-authored-by: Greg Kuhlmann <[email protected]>
Co-authored-by: Leonid <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean up inferencing code
5 participants