Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add entity join key and fix entity references #1429

Merged
merged 5 commits into from
Apr 10, 2021

Conversation

jklegar
Copy link
Collaborator

@jklegar jklegar commented Mar 31, 2021

What this PR does / why we need it: Add entity join key and fix FeatureStore methods to use join keys instead of input entity names

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

Add optional join_key parameter to Entity that defaults to the entity name. Previously, the entity name was both the name of the entity and the name of the column it referred to; now one can set join_key to the column name and set name to a more human-readable string.

# create FeatureView
fv = FeatureView(
name="test_bq_table_correctness",
entities=["driver_id"],
entities=["Driver ID"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, we support arbitrary text for the name? I don't think thats a good idea. We should probably just be supporting lowercase alphanumeric with underscore.

@jklegar jklegar changed the title WIP Add entity join key and fix entity references Add entity join key and fix entity references Apr 6, 2021
@jklegar
Copy link
Collaborator Author

jklegar commented Apr 6, 2021

/kind housekeeping

feature_name_columns,
event_timestamp_column,
created_timestamp_column,
) = feature_view.run_reverse_field_mapping(join_keys)
Copy link
Member

@woop woop Apr 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if its a good practice to query the registry here and pass in the join keys. Its almost a half step. We should be pulling the feature views from the registry here, based on their names. Then we should either

  • Pull the entities as you've done it here and have a helper function like run_reverse_field_mapping(feature_view, entities) which is a reversal of this method approach, or
  • When we pull the FeatureViews from the registry they get a reference to the FeatureStore stored inside of them. Then we can add a method like FeatureView.get_entities() which returns its Entity objects. May need to change the previous one to FeatureView.entity_names so that it isnt confusing. Anyway, then when you run FeatureView.run_reverse_field_mapping() it will find its own entities using the reference it has to the FeatureStore class, as opposed to it being passed in.
    or
  • We remove this List[str] as entities and replace it with List[Entity] so that when we do a get_feature_view() we actually get the entity details back in the first place. Its also much cleaner for end users when registering FeatureViews since they dont have to deal with string references.

Given how much time is left, I am leaning towards the first option, but its definitely technical debt.

What do you think @jklegar ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there's definitely tech debt here - IMO the main problem is getting the feature view from the registry doesn't also get the entities from the registry and we should do something like your third option (in a separate diff). Happy to do the first option here though not sure how much it actually gets us since we need join_keys in this method further down anyway

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my concern is just that we are making the code base more complicated (using a method) but we're not gaining anything from that since the method acts like a function.

need join_keys in this method further down anyway

Wouldnt we be able to use this list of entities to get the join keys? I think what I am trying to avoid as much as possible is to have methods with I/O. I want to us to try and follow a more immutable/functional approach as much as we can (which I realize isnt 100% possible). So get_entities/get_feature_views etc should ideally be run only once near the start of one of our methods and then the state should just propagate down. It'll be way easier to reason about and test long term.

Signed-off-by: Jacob Klegar <[email protected]>
Signed-off-by: Jacob Klegar <[email protected]>
Signed-off-by: Jacob Klegar <[email protected]>
Signed-off-by: Willem Pienaar <[email protected]>
Signed-off-by: Willem Pienaar <[email protected]>
@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jklegar, woop

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@woop
Copy link
Member

woop commented Apr 10, 2021

/lgtm

@feast-ci-bot feast-ci-bot merged commit ec9c4fd into feast-dev:master Apr 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants