Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Python SDK support for labels on feature sets #707

Merged

Conversation

Joostrothweiler
Copy link
Contributor

What this PR does / why we need it:

Adding labels to Python sdk with relevant FeatureSet class functions:

set_label(key: str, value: str)
remove_label(key: str)

Which issue(s) this PR fixes:

Fixes #663

Does this PR introduce a user-facing change?:

Users can use the Python SDK to add metadata to feature sets in the form of labels.

@feast-ci-bot
Copy link
Collaborator

Hi @Joostrothweiler. Thanks for your PR.

I'm waiting for a gojek member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Joostrothweiler
Copy link
Contributor Author

/assign @zhilingc

@ches
Copy link
Member

ches commented May 15, 2020

Thanks for the contribution @Joostrothweiler!

I haven't reviewed yet, but something I was just thinking of that I want to remind us all of:

Verify that this enables the Feast CLI to handle labels applied with YAML feature set specifications.

It needs a bit of extra work I think, so this could be separate task/issue if you can't get to it. Just want to track it if we decide that, because I think the YAML becomes especially helpful if using quite a bit of metadata.

It may raise questions like if "applying" a feature set should nullify existing label fields if labels aren't given every time, which would be a usability annoyance IMO. But this definitely goes into new discussion territory outside this PR.

ches
ches previously requested changes May 16, 2020
Copy link
Member

@ches ches left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bug on Field so that is critical.

Empty dict vs. Optional is a design choice that I'll leave the maintainers to weigh in on, I'm not too familiar with the Python SDK and my sensibility of Pythonic is rusty these days.

I do think there is desire to kill Field in Python now as it has been in Java, and to update E2E tests for labels added in #536 to use this new API. These could be follow-ups, again I'll leave to others to weigh in.

Comment on lines 272 to 275
if not self.labels or key not in self.labels.keys():
raise ValueError("Could not find label key " + key + ", no action taken")
elif key in self.labels.keys():
del self.labels[key]
Copy link
Member

@ches ches May 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is consistent with how the drop method behaves for features and entities on the feature set, so the ValueError might be the right thing for the scope of this PR. Separately though, I wonder if KeyError would be more appropriate for both, if not an application-specific error for drop because a feature set with no entities or features is dubious.

I might be inclined to initialize labels as an empty dictionary instead of the type being Optional though—unless there's a clear application-specific semantic reason for distinguishing no value from empty, I think empty is preferable design for collections. Give a stronger contract and spare both us and callers from null checks.

In that case, the method implementation could be reduced to del self.labels[key] with KeyError propagating, unless we want to give it a more descriptive error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it makes more sense. Changed it to just use del. Shall I include a change for the drop method to raise a KeyErrors in this PR as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, makes sense.

Copy link
Member

@ches ches May 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care much that labels are an OrderedDict? It's not expressed by the type MutableMapping either (this is okay if ordering is used as an implementation detail but isn't part of contract, not sure that distinction exists here though).

After updates we still have this signature:

labels: Optional[MutableMapping[str, str]] = None

which to me would be nice if were simpler:

labels: MutableMapping[str, str] = {}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care much that labels are an OrderedDict? It's not expressed by the type MutableMapping either (this is okay if ordering is used as an implementation detail but isn't part of contract, not sure that distinction exists here though).

I don't think we care much. I just aligned this with self._fields. Is there a particular reason why we would want it in this case but not for labels? I can change it to initialize with a simple empty dict if this is preferred.

After updates we still have this signature:

labels: Optional[MutableMapping[str, str]] = None

which to me would be nice if were simpler:

labels: MutableMapping[str, str] = {}

This would set the default value to a mutable object, which Pycharm elegantly suggests not to do:
"Default argument values are evaluated only once at function definition time, which means that modifying the default value of the argument will affect all subsequent calls of the function."

Copy link
Member

@ches ches May 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yes I forgot about this quirk in Python, I'm sorry. Now I'm reminded of the idiom for argument types like features: List[Feature] = None and if features is None to initialize. So that pattern should be followed, sans Optional type.

Regarding OrderedDict I'll leave it to @woop or others, I'm not sure if there was reason behind it for fields and if it should apply to labels or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not have a special ordered contract with end users. I think when I used OrderedDict in the Python SDK I was trying to have my cake and eat it as well.

For Fields/Features/Entities we should be using Lists (as users see them) and dicts as its implemented inside the FeatureSet class. We do not have an ordered contract with users. So technically speaking they should be maps/dicts across Feast, but from a user perspective I feel like its cleaner to expose lists, especially when persisting as YAML/JSON.

For labels I am happy to commit to simply a dict() instead of an OrderedDict.

Other than that the PR looks good @Joostrothweiler

sdk/python/feast/field.py Outdated Show resolved Hide resolved
@woop
Copy link
Member

woop commented May 16, 2020

/retest

@woop woop added the kind/feature New feature or request label May 16, 2020
@Joostrothweiler Joostrothweiler requested a review from ches May 17, 2020 11:32
@Joostrothweiler
Copy link
Contributor Author

There is a bug on Field so that is critical.

Missed this one.. But indeed this also showed I missed some tests for Feature. Made the change and added some tests.

Empty dict vs. Optional is a design choice that I'll leave the maintainers to weigh in on, I'm not too familiar with the Python SDK and my sensibility of Pythonic is rusty these days.

I agree that an empty dict would make more sense. Changed it to initialize it to OrderedDict by default, similar to _fields.

I do think there is desire to kill Field in Python now as it has been in Java, and to update E2E tests for labels added in #536 to use this new API. These could be follow-ups, again I'll leave to others to weigh in.

Have not made any changes regarding this. If it's something that should be part of this PR I'm happy to make the changes.

@woop
Copy link
Member

woop commented May 17, 2020

Verify that this enables the Feast CLI to handle labels applied with YAML feature set specifications.

Which raises another good point. We don't have any kind of e2e test coverage for the CLI.

It may raise questions like if "applying" a feature set should nullify existing label fields if labels aren't given every time, which would be a usability annoyance IMO. But this definitely goes into new discussion territory outside this PR.

I think we should probably just try to model ourselves after Kubernetes here, although I admit I am not sure how this is handled. I do know that there are various ways to update a resource though.

Have not made any changes regarding this. If it's something that should be part of this PR I'm happy to make the changes.

I think it's out of scope for your PR. The desire is strong though :)

Thanks for the PR @Joostrothweiler

@woop
Copy link
Member

woop commented May 17, 2020

@Joostrothweiler I do think updating the e2e tests to make sure that labels works is important here. There was some coverage, but not for this new API obviously.

@Joostrothweiler
Copy link
Contributor Author

@woop I updated the existing e2e test cases to take the labels directly from the Feature(Set) class and use to_proto function.

@woop
Copy link
Member

woop commented May 25, 2020

@Joostrothweiler Apologies for taking so long to review this. Will try to get to it as soon as possible.

@Joostrothweiler
Copy link
Contributor Author

@woop no problem. Let me know if there's anything I can do

@woop woop requested a review from pyalex as a code owner June 8, 2020 07:47
@woop
Copy link
Member

woop commented Jun 8, 2020

I will merge this in. We can resolve the issues around Dicts later. Same applies to Field vs Entities/Features.

@woop woop changed the title Python sdk support labels Add support for labels on feature sets Jun 8, 2020
@woop
Copy link
Member

woop commented Jun 8, 2020

/lgtm

@woop
Copy link
Member

woop commented Jun 9, 2020

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Joostrothweiler, woop

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit 6fd0874 into feast-dev:master Jun 9, 2020
@ches ches changed the title Add support for labels on feature sets Add Python SDK support for labels on feature sets Jul 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Python SDK to support labels
5 participants