Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break up queries for point in time correctness joins #347

Merged
merged 2 commits into from
Dec 9, 2019

Conversation

zhilingc
Copy link
Collaborator

@zhilingc zhilingc commented Dec 4, 2019

Plus some gentle refactoring.
The flow for batch retrieval now looks something like this:

            +-------------+
            | load entity |
            | table to    |
            | BQ          |
            +------+------+
                   |
                   |
            +------v-------+
            | Add UUIDs to |
            | entity table |
            +------+-------+
                   |
        +----------+-----------+
        |                      |
+-------v-------+      +-------v-------+
| Point-in-time |      | Point-in-time |
| correctness   | ...  | correctness   |
| join for FS1  |      | join for FSn  |
+-------+-------+      +-------+-------+
        |                      |
        +----------+-----------+
                   |
                   |
            +------v-------+
            | Left join all|
            | subtables to |
            | entity table |
            +------+-------+
                   |
           +-------v--------+
           | output to file |
           +----------------+

All in all this should result in less resource utilisation and faster queries, since we hold less data in memory.

Also to note: at each feature set p-i-t join step, only the relevant entities are used. This allows users to pass in unpartitionable values (like floats) in their entity table as long as they do not use it as an actual key.

@zhilingc
Copy link
Collaborator Author

zhilingc commented Dec 4, 2019

/retest

@feast-ci-bot
Copy link
Collaborator

@zhilingc: you cannot LGTM your own PR.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@woop
Copy link
Member

woop commented Dec 9, 2019

/lgtm

woop
woop previously approved these changes Dec 9, 2019
zhilingc added 2 commits December 9, 2019 17:17
…e refactoring

Cleanup; Remove lombok dependency

Add end to end tests for batch retrieval

Only make necessary functions transactional

Fix typo for e2e test file

Fix sorting for windowing

Add left join so that rows are consistent

Remove UUID
Clean up threads when error is thrown
@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zhilingc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zhilingc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zhilingc
Copy link
Collaborator Author

zhilingc commented Dec 9, 2019

/retest

@davidheryanto
Copy link
Collaborator

/lgtm

@feast-ci-bot feast-ci-bot merged commit 306b146 into feast-dev:master Dec 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants