Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Parse inline data sources #3036

Merged
merged 3 commits into from
Aug 15, 2022

Conversation

felixwang9817
Copy link
Collaborator

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #2991

sdk/python/feast/stream_feature_view.py Outdated Show resolved Hide resolved
sdk/python/feast/repo_operations.py Outdated Show resolved Hide resolved
# Handle stream sources defined with feature views.
if obj.stream_source:
stream_source = obj.stream_source
if not any((stream_source is ds) for ds in res.data_sources):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not any((stream_source is ds) for ds in res.data_sources):
if stream_source not in res.data_sources:

is there any reason this wouldn't work?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah so the reason I want to filter by is instead of == is precisely to avoid deduping

the only kind of duplication we should allow is directly importing an object from another file in the feature repo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other kind of duplication are you thinking about?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might e.g. define a file source in a different .py file with just the path data.parquet and then re-define it inline for a feature view with the same path

this would result in two unique FileSource objects that are ==, which I think we should prevent

Comment on lines +166 to +168
if not any((batch_source is ds) for ds in res.data_sources):
res.data_sources.append(batch_source)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not any((batch_source is ds) for ds in res.data_sources):
res.data_sources.append(batch_source)
if batch_source not in res.data_sources:
res.data_sources.append(batch_source)

May be cleaner? Or cleaner still to continue using a set and just add blindly and let the set dedupe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same reason as above

@felixwang9817
Copy link
Collaborator Author

also note that the test errors here will be fixed once #3037 lands

@codecov-commenter
Copy link

codecov-commenter commented Aug 8, 2022

Codecov Report

Merging #3036 (e733c89) into master (0ed1a63) will increase coverage by 8.53%.
The diff coverage is 85.33%.

@@            Coverage Diff             @@
##           master    #3036      +/-   ##
==========================================
+ Coverage   67.44%   75.98%   +8.53%     
==========================================
  Files         169      202      +33     
  Lines       14936    16914    +1978     
==========================================
+ Hits        10074    12852    +2778     
+ Misses       4862     4062     -800     
Flag Coverage Δ
integrationtests 67.05% <26.66%> (-0.40%) ⬇️
unittests 58.28% <85.33%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/feast/cli.py 41.59% <ø> (-0.10%) ⬇️
sdk/python/feast/repo_operations.py 49.53% <5.00%> (+24.16%) ⬆️
...unit/local_feast_tests/test_local_feature_store.py 97.88% <97.36%> (+66.55%) ⬆️
sdk/python/feast/feature_store.py 85.26% <100.00%> (+3.20%) ⬆️
sdk/python/tests/utils/cli_repo_creator.py 98.03% <100.00%> (+9.15%) ⬆️
...s/contrib/postgres_offline_store/tests/__init__.py 100.00% <0.00%> (ø)
...line_stores/contrib/postgres_repo_configuration.py 100.00% <0.00%> (ø)
sdk/python/feast/loaders/yaml.py 18.18% <0.00%> (ø)
...k/python/feast/infra/materialization/lambda/app.py 26.66% <0.00%> (ø)
..._stores/contrib/postgres_offline_store/postgres.py 35.22% <0.00%> (ø)
... and 102 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@adchia
Copy link
Collaborator

adchia commented Aug 9, 2022

nit: have more descriptive PR title since this goes into the changelog

@felixwang9817 felixwang9817 changed the title fix: Fix repo parsing logic fix: Ensure inline data sources are correctly parsed Aug 9, 2022
@felixwang9817 felixwang9817 changed the title fix: Ensure inline data sources are correctly parsed fix: Correctly parse inline data sources Aug 9, 2022
@felixwang9817
Copy link
Collaborator Author

@adchia fixed!

@felixwang9817 felixwang9817 changed the title fix: Correctly parse inline data sources fix: Parse inline data sources Aug 9, 2022
Signed-off-by: Felix Wang <[email protected]>
Signed-off-by: Felix Wang <[email protected]>
Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adchia, felixwang9817

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [adchia,felixwang9817]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit c7ba370 into feast-dev:master Aug 15, 2022
kevjumba pushed a commit that referenced this pull request Aug 25, 2022
# [0.24.0](v0.23.0...v0.24.0) (2022-08-25)

### Bug Fixes

* Check if on_demand_feature_views is an empty list rather than None for snowflake provider ([#3046](#3046)) ([9b05e65](9b05e65))
* FeatureStore.apply applies BatchFeatureView correctly ([#3098](#3098)) ([41be511](41be511))
* Fix Feast Java inconsistency with int64 serialization vs python ([#3031](#3031)) ([4bba787](4bba787))
* Fix feature service inference logic ([#3089](#3089)) ([4310ed7](4310ed7))
* Fix field mapping logic during feature inference ([#3067](#3067)) ([cdfa761](cdfa761))
* Fix incorrect on demand feature view diffing and improve Java tests ([#3074](#3074)) ([0702310](0702310))
* Fix Java helm charts to work with refactored logic. Fix FTS image ([#3105](#3105)) ([2b493e0](2b493e0))
* Fix on demand feature view output in feast plan + Web UI crash ([#3057](#3057)) ([bfae6ac](bfae6ac))
* Fix release workflow to release 0.24.0 ([#3138](#3138)) ([a69aaae](a69aaae))
* Fix Spark offline store type conversion to arrow ([#3071](#3071)) ([b26566d](b26566d))
* Fixing Web UI, which fails for the SQL registry ([#3028](#3028)) ([64603b6](64603b6))
* Force Snowflake Session to Timezone UTC ([#3083](#3083)) ([9f221e6](9f221e6))
* Make infer dummy entity join key idempotent ([#3115](#3115)) ([1f5b1e0](1f5b1e0))
* More explicit error messages ([#2708](#2708)) ([e4d7afd](e4d7afd))
* Parse inline data sources ([#3036](#3036)) ([c7ba370](c7ba370))
* Prevent overwriting existing file during `persist` ([#3088](#3088)) ([69af21f](69af21f))
* Register BatchFeatureView in feature repos correctly ([#3092](#3092)) ([b8e39ea](b8e39ea))
* Return an empty infra object from sql registry when it doesn't exist ([#3022](#3022)) ([8ba87d1](8ba87d1))
* Teardown tables for Snowflake Materialization testing ([#3106](#3106)) ([0a0c974](0a0c974))
* UI error when saved dataset is present in registry. ([#3124](#3124)) ([83cf753](83cf753))
* Update sql.py ([#3096](#3096)) ([2646a86](2646a86))
* Updated snowflake template ([#3130](#3130)) ([f0594e1](f0594e1))

### Features

* Add authentication option for snowflake connector ([#3039](#3039)) ([74c75f1](74c75f1))
* Add Cassandra/AstraDB online store contribution ([#2873](#2873)) ([feb6cb8](feb6cb8))
* Add Snowflake materialization engine ([#2948](#2948)) ([f3b522b](f3b522b))
* Adding saved dataset capabilities for Postgres  ([#3070](#3070)) ([d3253c3](d3253c3))
* Allow passing repo config path via flag ([#3077](#3077)) ([0d2d951](0d2d951))
* Contrib azure provider with synapse/mssql offline store and Azure registry store ([#3072](#3072)) ([9f7e557](9f7e557))
* Custom Docker image for Bytewax batch materialization ([#3099](#3099)) ([cdd1b07](cdd1b07))
* Feast AWS Athena offline store (again) ([#3044](#3044)) ([989ce08](989ce08))
* Implement spark offline store `offline_write_batch` method ([#3076](#3076)) ([5b0cc87](5b0cc87))
* Initial Bytewax materialization engine ([#2974](#2974)) ([55c61f9](55c61f9))
* Refactor feature server helm charts to allow passing feature_store.yaml in environment variables ([#3113](#3113)) ([85ee789](85ee789))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

data sources displayed different depending on "inline" definition
5 participants