Add Python SDK support for labels on feature sets #707

Joostrothweiler · 2020-05-14T21:09:39Z

What this PR does / why we need it:

Adding labels to Python sdk with relevant FeatureSet class functions:

set_label(key: str, value: str)
remove_label(key: str)

Which issue(s) this PR fixes:

Fixes #663

Does this PR introduce a user-facing change?:

Users can use the Python SDK to add metadata to feature sets in the form of labels.

feast-ci-bot · 2020-05-14T21:09:55Z

Hi @Joostrothweiler. Thanks for your PR.

I'm waiting for a gojek member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Joostrothweiler · 2020-05-15T05:45:51Z

/assign @zhilingc

ches · 2020-05-15T07:00:20Z

Thanks for the contribution @Joostrothweiler!

I haven't reviewed yet, but something I was just thinking of that I want to remind us all of:

Verify that this enables the Feast CLI to handle labels applied with YAML feature set specifications.

It needs a bit of extra work I think, so this could be separate task/issue if you can't get to it. Just want to track it if we decide that, because I think the YAML becomes especially helpful if using quite a bit of metadata.

It may raise questions like if "applying" a feature set should nullify existing label fields if labels aren't given every time, which would be a usability annoyance IMO. But this definitely goes into new discussion territory outside this PR.

ches

There is a bug on Field so that is critical.

Empty dict vs. Optional is a design choice that I'll leave the maintainers to weigh in on, I'm not too familiar with the Python SDK and my sensibility of Pythonic is rusty these days.

I do think there is desire to kill Field in Python now as it has been in Java, and to update E2E tests for labels added in #536 to use this new API. These could be follow-ups, again I'll leave to others to weigh in.

ches · 2020-05-16T03:07:30Z

sdk/python/feast/feature_set.py

+        if not self.labels or key not in self.labels.keys():
+            raise ValueError("Could not find label key " + key + ", no action taken")
+        elif key in self.labels.keys():
+            del self.labels[key]


This is consistent with how the drop method behaves for features and entities on the feature set, so the ValueError might be the right thing for the scope of this PR. Separately though, I wonder if KeyError would be more appropriate for both, if not an application-specific error for drop because a feature set with no entities or features is dubious.

I might be inclined to initialize labels as an empty dictionary instead of the type being Optional though—unless there's a clear application-specific semantic reason for distinguishing no value from empty, I think empty is preferable design for collections. Give a stronger contract and spare both us and callers from null checks.

In that case, the method implementation could be reduced to del self.labels[key] with KeyError propagating, unless we want to give it a more descriptive error message.

Indeed it makes more sense. Changed it to just use del. Shall I include a change for the drop method to raise a KeyErrors in this PR as well?

Yea, makes sense.

Do we care much that labels are an OrderedDict? It's not expressed by the type MutableMapping either (this is okay if ordering is used as an implementation detail but isn't part of contract, not sure that distinction exists here though).

After updates we still have this signature:

labels: Optional[MutableMapping[str, str]] = None

which to me would be nice if were simpler:

labels: MutableMapping[str, str] = {}

Do we care much that labels are an OrderedDict? It's not expressed by the type MutableMapping either (this is okay if ordering is used as an implementation detail but isn't part of contract, not sure that distinction exists here though).

I don't think we care much. I just aligned this with self._fields. Is there a particular reason why we would want it in this case but not for labels? I can change it to initialize with a simple empty dict if this is preferred.

After updates we still have this signature:

labels: Optional[MutableMapping[str, str]] = None

which to me would be nice if were simpler:

labels: MutableMapping[str, str] = {}

This would set the default value to a mutable object, which Pycharm elegantly suggests not to do:
"Default argument values are evaluated only once at function definition time, which means that modifying the default value of the argument will affect all subsequent calls of the function."

Ahh yes I forgot about this quirk in Python, I'm sorry. Now I'm reminded of the idiom for argument types like features: List[Feature] = None and if features is None to initialize. So that pattern should be followed, sans Optional type.

Regarding OrderedDict I'll leave it to @woop or others, I'm not sure if there was reason behind it for fields and if it should apply to labels or not.

We do not have a special ordered contract with end users. I think when I used OrderedDict in the Python SDK I was trying to have my cake and eat it as well.

For Fields/Features/Entities we should be using Lists (as users see them) and dicts as its implemented inside the FeatureSet class. We do not have an ordered contract with users. So technically speaking they should be maps/dicts across Feast, but from a user perspective I feel like its cleaner to expose lists, especially when persisting as YAML/JSON.

For labels I am happy to commit to simply a dict() instead of an OrderedDict.

Other than that the PR looks good @Joostrothweiler

sdk/python/feast/field.py

woop · 2020-05-16T06:24:58Z

/retest

Joostrothweiler · 2020-05-17T11:35:04Z

There is a bug on Field so that is critical.

Missed this one.. But indeed this also showed I missed some tests for Feature. Made the change and added some tests.

Empty dict vs. Optional is a design choice that I'll leave the maintainers to weigh in on, I'm not too familiar with the Python SDK and my sensibility of Pythonic is rusty these days.

I agree that an empty dict would make more sense. Changed it to initialize it to OrderedDict by default, similar to _fields.

I do think there is desire to kill Field in Python now as it has been in Java, and to update E2E tests for labels added in #536 to use this new API. These could be follow-ups, again I'll leave to others to weigh in.

Have not made any changes regarding this. If it's something that should be part of this PR I'm happy to make the changes.

woop · 2020-05-17T13:26:02Z

Verify that this enables the Feast CLI to handle labels applied with YAML feature set specifications.

Which raises another good point. We don't have any kind of e2e test coverage for the CLI.

It may raise questions like if "applying" a feature set should nullify existing label fields if labels aren't given every time, which would be a usability annoyance IMO. But this definitely goes into new discussion territory outside this PR.

I think we should probably just try to model ourselves after Kubernetes here, although I admit I am not sure how this is handled. I do know that there are various ways to update a resource though.

Have not made any changes regarding this. If it's something that should be part of this PR I'm happy to make the changes.

I think it's out of scope for your PR. The desire is strong though :)

Thanks for the PR @Joostrothweiler

woop · 2020-05-17T13:37:46Z

@Joostrothweiler I do think updating the e2e tests to make sure that labels works is important here. There was some coverage, but not for this new API obviously.

Joostrothweiler · 2020-05-20T06:37:41Z

@woop I updated the existing e2e test cases to take the labels directly from the Feature(Set) class and use to_proto function.

woop · 2020-05-25T11:23:57Z

@Joostrothweiler Apologies for taking so long to review this. Will try to get to it as soon as possible.

Joostrothweiler · 2020-06-02T19:12:25Z

@woop no problem. Let me know if there's anything I can do

woop · 2020-06-08T07:47:27Z

I will merge this in. We can resolve the issues around Dicts later. Same applies to Field vs Entities/Features.

woop · 2020-06-08T07:48:49Z

/lgtm

…mpty dict

…s in constructor

Not needed :)

woop · 2020-06-09T09:23:06Z

/lgtm

feast-ci-bot · 2020-06-09T09:24:01Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Joostrothweiler, woop

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [woop]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Joostrothweiler requested review from davidheryanto, khorshuheng, pradithya, woop and zhilingc as code owners May 14, 2020 21:09

feast-ci-bot added needs-kind needs-ok-to-test size/L labels May 14, 2020

feast-ci-bot assigned zhilingc May 15, 2020

ches previously requested changes May 16, 2020

View reviewed changes

woop added the kind/feature New feature or request label May 16, 2020

feast-ci-bot removed the needs-kind label May 16, 2020

woop added the ok-to-test label May 16, 2020

feast-ci-bot removed the needs-ok-to-test label May 17, 2020

Joostrothweiler requested a review from ches May 17, 2020 11:32

woop force-pushed the master branch from 68eb265 to 35f8f18 Compare May 24, 2020 06:47

woop requested a review from pyalex as a code owner June 8, 2020 07:47

woop approved these changes Jun 8, 2020

View reviewed changes

feast-ci-bot added the approved label Jun 8, 2020

woop changed the title ~~Python sdk support labels~~ Add support for labels on feature sets Jun 8, 2020

feast-ci-bot assigned woop Jun 8, 2020

feast-ci-bot added the lgtm label Jun 8, 2020

Joost Rothweiler added 5 commits June 9, 2020 16:06

Update Python SDK to support labels

4aefac5

Format python code

b2467cd

Fix equals comparison FeatureSet

bc62bb8

Fix bug labels returns presence and initialize labels by default as e…

e7083fe

…mpty dict

Propagate KeyErrors labels and fields and update e2e tests with label…

b776dff

…s in constructor

terryyylim force-pushed the python-sdk-support-labels branch from 196d357 to 117e727 Compare June 9, 2020 08:47

feast-ci-bot removed the lgtm label Jun 9, 2020

terryyylim force-pushed the python-sdk-support-labels branch from 117e727 to b776dff Compare June 9, 2020 09:03

feast-ci-bot added the lgtm label Jun 9, 2020

woop approved these changes Jun 9, 2020

View reviewed changes

feast-ci-bot merged commit 6fd0874 into feast-dev:master Jun 9, 2020

ches changed the title ~~Add support for labels on feature sets~~ Add Python SDK support for labels on feature sets Jul 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Python SDK support for labels on feature sets #707

Add Python SDK support for labels on feature sets #707

Joostrothweiler commented May 14, 2020

feast-ci-bot commented May 14, 2020

Joostrothweiler commented May 15, 2020

ches commented May 15, 2020

ches left a comment

ches May 16, 2020 •

edited

Loading

Joostrothweiler May 17, 2020

woop May 17, 2020

ches May 21, 2020 •

edited

Loading

Joostrothweiler May 21, 2020

ches May 21, 2020 •

edited

Loading

woop Jun 3, 2020

woop commented May 16, 2020

Joostrothweiler commented May 17, 2020

woop commented May 17, 2020

woop commented May 17, 2020

Joostrothweiler commented May 20, 2020

woop commented May 25, 2020

Joostrothweiler commented Jun 2, 2020

woop commented Jun 8, 2020 •

edited

Loading

woop commented Jun 8, 2020

woop commented Jun 9, 2020

feast-ci-bot commented Jun 9, 2020

Add Python SDK support for labels on feature sets #707

Add Python SDK support for labels on feature sets #707

Conversation

Joostrothweiler commented May 14, 2020

feast-ci-bot commented May 14, 2020

Joostrothweiler commented May 15, 2020

ches commented May 15, 2020

ches left a comment

Choose a reason for hiding this comment

ches May 16, 2020 • edited Loading

Choose a reason for hiding this comment

Joostrothweiler May 17, 2020

Choose a reason for hiding this comment

woop May 17, 2020

Choose a reason for hiding this comment

ches May 21, 2020 • edited Loading

Choose a reason for hiding this comment

Joostrothweiler May 21, 2020

Choose a reason for hiding this comment

ches May 21, 2020 • edited Loading

Choose a reason for hiding this comment

woop Jun 3, 2020

Choose a reason for hiding this comment

woop commented May 16, 2020

Joostrothweiler commented May 17, 2020

woop commented May 17, 2020

woop commented May 17, 2020

Joostrothweiler commented May 20, 2020

woop commented May 25, 2020

Joostrothweiler commented Jun 2, 2020

woop commented Jun 8, 2020 • edited Loading

woop commented Jun 8, 2020

woop commented Jun 9, 2020

feast-ci-bot commented Jun 9, 2020

ches May 16, 2020 •

edited

Loading

ches May 21, 2020 •

edited

Loading

ches May 21, 2020 •

edited

Loading

woop commented Jun 8, 2020 •

edited

Loading