Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support retrieval from multiple feature views with different join keys #2835

Merged
merged 5 commits into from
Jun 30, 2022

Conversation

yongheng
Copy link
Contributor

@yongheng yongheng commented Jun 22, 2022

What this PR does / why we need it:

Currently Java Feature Server doesn't support retrieval from multiple feature views with different join keys. For each gPRC request, OnlineServingServiceV2 calls OnlineRetriever once and only once. In this call the former sends all join keys in the original request to the latter, and the latter simply sorts and concatenates all join keys to make a Redis key.

This PR supports retrieval from multiple feature views with different join keys. For each gPRC request, it groups feature references by join keys and for each group it makes a call to OnlineRetriever.

Which issue(s) this PR fixes:

Fixes #

@yongheng yongheng force-pushed the retrieve-multiple-fvs branch 3 times, most recently from 01c5aff to f913090 Compare June 22, 2022 17:11
@yongheng
Copy link
Contributor Author

/assign pyalex

@yongheng
Copy link
Contributor Author

/assign adchia

@yongheng
Copy link
Contributor Author

/cc @pyalex @adchia

@achals
Copy link
Member

achals commented Jun 22, 2022

/ok-to-test

@codecov-commenter
Copy link

codecov-commenter commented Jun 22, 2022

Codecov Report

Merging #2835 (f913090) into master (7d344b7) will decrease coverage by 0.00%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2835      +/-   ##
==========================================
- Coverage   59.63%   59.62%   -0.01%     
==========================================
  Files         174      174              
  Lines       15493    15493              
==========================================
- Hits         9239     9238       -1     
- Misses       6254     6255       +1     
Flag Coverage Δ
unittests 59.62% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/tests/conftest.py 66.18% <0.00%> (-0.72%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7d344b7...f913090. Read the comment docs.

@pyalex
Copy link
Collaborator

pyalex commented Jun 22, 2022

Hi @yongheng , can you please add integration test for this flow?


// Group feature references by join keys.
Map<String, List<FeatureReferenceV2>> groupNameToFeatureReferencesMap =
featureReferences.stream()
Copy link
Collaborator

@pyalex pyalex Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To speed up this part we might want to extract distinct feature views from all feature references. And then group feature views instead.

Copy link
Contributor Author

@yongheng yongheng Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC grouping by join keys results in the same or less groups (therefore same or more efficient) than grouping by feature view. The is because different feature views can have the same join keys. In L286, this.registryRepository.getEntitiesList(featureReference) internally gets feature view spec first, then gets entity names of the feature view spec, then we find join keys for the entity names.

Actually, I grouped by feature view at the beginning. Then I switched to grouping by join keys in the second commit of this PR, as an optimization.

@yongheng
Copy link
Contributor Author

Hi @yongheng , can you please add integration test for this flow?

@pyalex Can you show me where are the current integration tests? Then I can add to there.

@yongheng yongheng requested a review from pyalex June 23, 2022 02:47
@kevjumba
Copy link
Collaborator

Hi @yongheng , can you please add integration test for this flow?

@pyalex Can you show me where are the current integration tests? Then I can add to there.

Hey @yongheng the integration tests should be any tests that have a tag for @pytest.mark.integration. For this particular test, just take a look at test_feature_views.py

@achals
Copy link
Member

achals commented Jun 23, 2022

@yongheng You can find the java integration tests here: https://github.com/feast-dev/feast/tree/master/java/serving/src/test/java/feast/serving/it

@yongheng
Copy link
Contributor Author

@achals @pyalex Integration test has been added and CI is green. Please take a look.

@yongheng
Copy link
Contributor Author

/cc achals

@feast-ci-bot feast-ci-bot requested a review from achals June 28, 2022 04:53
@yongheng
Copy link
Contributor Author

@achals @pyalex @kevjumba I've added an integration test. Please take a look. Thanks!

Copy link
Member

@achals achals left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: achals, yongheng

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit 056cfa1 into feast-dev:master Jun 30, 2022
felixwang9817 pushed a commit to felixwang9817/feast that referenced this pull request Jul 1, 2022
…in keys (feast-dev#2835)

* feat: Support retrieving from multiple feature views

Signed-off-by: Yongheng Lin <[email protected]>

* group by join keys instead of feature view

Signed-off-by: Yongheng Lin <[email protected]>

* tolerate insufficient entities

Signed-off-by: Yongheng Lin <[email protected]>

* mock registry.getEntityJoinKey

Signed-off-by: Yongheng Lin <[email protected]>

* add integration test

Signed-off-by: Yongheng Lin <[email protected]>
felixwang9817 pushed a commit that referenced this pull request Aug 2, 2022
# [0.23.0](v0.22.0...v0.23.0) (2022-08-02)

### Bug Fixes

* Add dummy alias to pull_all_from_table_or_query ([#2956](#2956)) ([5e45228](5e45228))
* Bump version of Guava to mitigate cve ([#2896](#2896)) ([51df8be](51df8be))
* Change numpy version on setup.py and upgrade it to resolve dependabot warning ([#2887](#2887)) ([80ea7a9](80ea7a9))
* Change the feature store plan method to public modifier ([#2904](#2904)) ([0ec7d1a](0ec7d1a))
* Deprecate 3.7 wheels and fix verification workflow ([#2934](#2934)) ([040c910](040c910))
* Do not allow same column to be reused in data sources ([#2965](#2965)) ([661c053](661c053))
* Fix build wheels workflow to install apache-arrow correctly ([#2932](#2932)) ([bdeb4ae](bdeb4ae))
* Fix file offline store logic for feature views without ttl ([#2971](#2971)) ([26f6b69](26f6b69))
* Fix grpc and update protobuf ([#2894](#2894)) ([86e9efd](86e9efd))
* Fix night ci syntax error and update readme ([#2935](#2935)) ([b917540](b917540))
* Fix nightly ci again ([#2939](#2939)) ([1603c9e](1603c9e))
* Fix the go build and use CgoArrowAllocator to prevent incorrect garbage collection ([#2919](#2919)) ([130746e](130746e))
* Fix typo in CONTRIBUTING.md ([#2955](#2955)) ([8534f69](8534f69))
* Fixing broken links to feast documentation on java readme and contribution ([#2892](#2892)) ([d044588](d044588))
* Fixing Spark min / max entity df event timestamps range return order ([#2735](#2735)) ([ac55ce2](ac55ce2))
* Move gcp back to 1.47.0 since grpcio-tools 1.48.0 got yanked from pypi ([#2990](#2990)) ([fc447eb](fc447eb))
* Refactor testing and sort out unit and integration tests ([#2975](#2975)) ([2680f7b](2680f7b))
* Remove hard-coded integration test setup for AWS & GCP ([#2970](#2970)) ([e4507ac](e4507ac))
* Resolve small typo in README file ([#2930](#2930)) ([16ae902](16ae902))
* Revert "feat: Add snowflake online store ([#2902](#2902))" ([#2909](#2909)) ([38fd001](38fd001))
* Snowflake_online_read fix ([#2988](#2988)) ([651ce34](651ce34))
* Spark source support table with pattern "db.table" ([#2606](#2606)) ([3ce5139](3ce5139)), closes [#2605](#2605)
* Switch mysql log string to use regex ([#2976](#2976)) ([5edf4b0](5edf4b0))
* Update gopy to point to fork to resolve github annotation errors. ([#2940](#2940)) ([ba2dcf1](ba2dcf1))
* Version entity serialization mechanism and fix issue with int64 vals ([#2944](#2944)) ([d0d27a3](d0d27a3))

### Features

* Add an experimental lambda-based materialization engine ([#2923](#2923)) ([6f79069](6f79069))
* Add column reordering to `write_to_offline_store` ([#2876](#2876)) ([8abc2ef](8abc2ef))
* Add custom JSON table tab w/ formatting ([#2851](#2851)) ([0159f38](0159f38))
* Add CustomSourceOptions to SavedDatasetStorage ([#2958](#2958)) ([23c09c8](23c09c8))
* Add Go option to `feast serve` command ([#2966](#2966)) ([a36a695](a36a695))
* Add interfaces for batch materialization engine ([#2901](#2901)) ([38b28ca](38b28ca))
* Add pages for individual Features to the Feast UI ([#2850](#2850)) ([9b97fca](9b97fca))
* Add snowflake online store ([#2902](#2902)) ([f758f9e](f758f9e)), closes [#2903](#2903)
* Add Snowflake online store (again) ([#2922](#2922)) ([2ef71fc](2ef71fc)), closes [#2903](#2903)
* Add to_remote_storage method to RetrievalJob ([#2916](#2916)) ([109ee9c](109ee9c))
* Support retrieval from multiple feature views with different join keys ([#2835](#2835)) ([056cfa1](056cfa1))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants