Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Prevent overwriting existing file during persist #3088

Merged
merged 5 commits into from
Aug 19, 2022

Conversation

felixwang9817
Copy link
Collaborator

Signed-off-by: Felix Wang [email protected]

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #3069

Comment on lines 65 to 71
if data_source:
assert isinstance(data_source, FileSource)
return SavedDatasetFileStorage(
path=data_source.path,
file_format=ParquetFormat(),
s3_endpoint_override=None,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the main reason we're changing the interface for create_saved_dataset_destination? Can you explain what purpose the passed in data source serves?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah so I want to copy an existing destination - however, each saved dataset object has a unique way of specifying destination (e.g. path vs. table vs. schema, etc.) so there's no super clean way to modify the create_saved_dataset_destination interface to allow for specifying a destination

the easiest thing to do is to just specify a data source, whose destination can be copied to a saved dataset object

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh i really dislike the idea of passing in a source as the destination :/ What else do you think we can pass in instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched to a different method, lmk what you think @achals

Signed-off-by: Felix Wang <[email protected]>
Signed-off-by: Felix Wang <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Aug 18, 2022

Codecov Report

Merging #3088 (8a1a6f0) into master (c93b4cc) will increase coverage by 8.74%.
The diff coverage is 93.65%.

@@            Coverage Diff             @@
##           master    #3088      +/-   ##
==========================================
+ Coverage   67.11%   75.86%   +8.74%     
==========================================
  Files         173      208      +35     
  Lines       15110    17126    +2016     
==========================================
+ Hits        10141    12992    +2851     
+ Misses       4969     4134     -835     
Flag Coverage Δ
integrationtests 67.04% <95.00%> (-0.08%) ⬇️
unittests 58.24% <57.14%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/feast/feature_store.py 85.39% <ø> (+3.64%) ⬆️
...ffline_stores/contrib/spark_offline_store/spark.py 32.98% <ø> (ø)
...ffline_stores/contrib/trino_offline_store/trino.py 8.72% <0.00%> (ø)
...dk/python/tests/integration/e2e/test_validation.py 96.85% <ø> (ø)
...fline_store/test_universal_historical_retrieval.py 100.00% <ø> (ø)
sdk/python/feast/saved_dataset.py 78.10% <78.57%> (+1.03%) ⬆️
sdk/python/feast/errors.py 70.45% <100.00%> (+2.82%) ⬆️
sdk/python/feast/infra/offline_stores/bigquery.py 86.15% <100.00%> (ø)
...line_stores/contrib/athena_offline_store/athena.py 38.46% <100.00%> (ø)
..._stores/contrib/postgres_offline_store/postgres.py 35.22% <100.00%> (ø)
... and 115 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Member

@achals achals left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: achals, felixwang9817

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [achals,felixwang9817]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Felix Wang <[email protected]>
@feast-ci-bot feast-ci-bot removed the lgtm label Aug 18, 2022
@kevjumba
Copy link
Collaborator

/lgtm

@feast-ci-bot feast-ci-bot merged commit 69af21f into feast-dev:master Aug 19, 2022
kevjumba pushed a commit that referenced this pull request Aug 25, 2022
# [0.24.0](v0.23.0...v0.24.0) (2022-08-25)

### Bug Fixes

* Check if on_demand_feature_views is an empty list rather than None for snowflake provider ([#3046](#3046)) ([9b05e65](9b05e65))
* FeatureStore.apply applies BatchFeatureView correctly ([#3098](#3098)) ([41be511](41be511))
* Fix Feast Java inconsistency with int64 serialization vs python ([#3031](#3031)) ([4bba787](4bba787))
* Fix feature service inference logic ([#3089](#3089)) ([4310ed7](4310ed7))
* Fix field mapping logic during feature inference ([#3067](#3067)) ([cdfa761](cdfa761))
* Fix incorrect on demand feature view diffing and improve Java tests ([#3074](#3074)) ([0702310](0702310))
* Fix Java helm charts to work with refactored logic. Fix FTS image ([#3105](#3105)) ([2b493e0](2b493e0))
* Fix on demand feature view output in feast plan + Web UI crash ([#3057](#3057)) ([bfae6ac](bfae6ac))
* Fix release workflow to release 0.24.0 ([#3138](#3138)) ([a69aaae](a69aaae))
* Fix Spark offline store type conversion to arrow ([#3071](#3071)) ([b26566d](b26566d))
* Fixing Web UI, which fails for the SQL registry ([#3028](#3028)) ([64603b6](64603b6))
* Force Snowflake Session to Timezone UTC ([#3083](#3083)) ([9f221e6](9f221e6))
* Make infer dummy entity join key idempotent ([#3115](#3115)) ([1f5b1e0](1f5b1e0))
* More explicit error messages ([#2708](#2708)) ([e4d7afd](e4d7afd))
* Parse inline data sources ([#3036](#3036)) ([c7ba370](c7ba370))
* Prevent overwriting existing file during `persist` ([#3088](#3088)) ([69af21f](69af21f))
* Register BatchFeatureView in feature repos correctly ([#3092](#3092)) ([b8e39ea](b8e39ea))
* Return an empty infra object from sql registry when it doesn't exist ([#3022](#3022)) ([8ba87d1](8ba87d1))
* Teardown tables for Snowflake Materialization testing ([#3106](#3106)) ([0a0c974](0a0c974))
* UI error when saved dataset is present in registry. ([#3124](#3124)) ([83cf753](83cf753))
* Update sql.py ([#3096](#3096)) ([2646a86](2646a86))
* Updated snowflake template ([#3130](#3130)) ([f0594e1](f0594e1))

### Features

* Add authentication option for snowflake connector ([#3039](#3039)) ([74c75f1](74c75f1))
* Add Cassandra/AstraDB online store contribution ([#2873](#2873)) ([feb6cb8](feb6cb8))
* Add Snowflake materialization engine ([#2948](#2948)) ([f3b522b](f3b522b))
* Adding saved dataset capabilities for Postgres  ([#3070](#3070)) ([d3253c3](d3253c3))
* Allow passing repo config path via flag ([#3077](#3077)) ([0d2d951](0d2d951))
* Contrib azure provider with synapse/mssql offline store and Azure registry store ([#3072](#3072)) ([9f7e557](9f7e557))
* Custom Docker image for Bytewax batch materialization ([#3099](#3099)) ([cdd1b07](cdd1b07))
* Feast AWS Athena offline store (again) ([#3044](#3044)) ([989ce08](989ce08))
* Implement spark offline store `offline_write_batch` method ([#3076](#3076)) ([5b0cc87](5b0cc87))
* Initial Bytewax materialization engine ([#2974](#2974)) ([55c61f9](55c61f9))
* Refactor feature server helm charts to allow passing feature_store.yaml in environment variables ([#3113](#3113)) ([85ee789](85ee789))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error in FileOfflineStore.get_historical_features.<locals>.evaluate_historical_retrieval()
5 participants