-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Python SDK support for labels on feature sets #707
Add Python SDK support for labels on feature sets #707
Conversation
Hi @Joostrothweiler. Thanks for your PR. I'm waiting for a gojek member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @zhilingc |
Thanks for the contribution @Joostrothweiler! I haven't reviewed yet, but something I was just thinking of that I want to remind us all of: Verify that this enables the Feast CLI to handle labels applied with YAML feature set specifications. It needs a bit of extra work I think, so this could be separate task/issue if you can't get to it. Just want to track it if we decide that, because I think the YAML becomes especially helpful if using quite a bit of metadata. It may raise questions like if "applying" a feature set should nullify existing label fields if labels aren't given every time, which would be a usability annoyance IMO. But this definitely goes into new discussion territory outside this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a bug on Field
so that is critical.
Empty dict vs. Optional
is a design choice that I'll leave the maintainers to weigh in on, I'm not too familiar with the Python SDK and my sensibility of Pythonic is rusty these days.
I do think there is desire to kill Field
in Python now as it has been in Java, and to update E2E tests for labels added in #536 to use this new API. These could be follow-ups, again I'll leave to others to weigh in.
sdk/python/feast/feature_set.py
Outdated
if not self.labels or key not in self.labels.keys(): | ||
raise ValueError("Could not find label key " + key + ", no action taken") | ||
elif key in self.labels.keys(): | ||
del self.labels[key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is consistent with how the drop
method behaves for features and entities on the feature set, so the ValueError
might be the right thing for the scope of this PR. Separately though, I wonder if KeyError
would be more appropriate for both, if not an application-specific error for drop
because a feature set with no entities or features is dubious.
I might be inclined to initialize labels as an empty dictionary instead of the type being Optional
though—unless there's a clear application-specific semantic reason for distinguishing no value from empty, I think empty is preferable design for collections. Give a stronger contract and spare both us and callers from null checks.
In that case, the method implementation could be reduced to del self.labels[key]
with KeyError
propagating, unless we want to give it a more descriptive error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it makes more sense. Changed it to just use del
. Shall I include a change for the drop
method to raise a KeyErrors
in this PR as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we care much that labels are an OrderedDict
? It's not expressed by the type MutableMapping
either (this is okay if ordering is used as an implementation detail but isn't part of contract, not sure that distinction exists here though).
After updates we still have this signature:
labels: Optional[MutableMapping[str, str]] = None
which to me would be nice if were simpler:
labels: MutableMapping[str, str] = {}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we care much that labels are an
OrderedDict
? It's not expressed by the typeMutableMapping
either (this is okay if ordering is used as an implementation detail but isn't part of contract, not sure that distinction exists here though).
I don't think we care much. I just aligned this with self._fields
. Is there a particular reason why we would want it in this case but not for labels? I can change it to initialize with a simple empty dict if this is preferred.
After updates we still have this signature:
labels: Optional[MutableMapping[str, str]] = Nonewhich to me would be nice if were simpler:
labels: MutableMapping[str, str] = {}
This would set the default value to a mutable object, which Pycharm elegantly suggests not to do:
"Default argument values are evaluated only once at function definition time, which means that modifying the default value of the argument will affect all subsequent calls of the function."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh yes I forgot about this quirk in Python, I'm sorry. Now I'm reminded of the idiom for argument types like features: List[Feature] = None
and if features is None
to initialize. So that pattern should be followed, sans Optional
type.
Regarding OrderedDict
I'll leave it to @woop or others, I'm not sure if there was reason behind it for fields and if it should apply to labels or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not have a special ordered contract with end users. I think when I used OrderedDict in the Python SDK I was trying to have my cake and eat it as well.
For Fields/Features/Entities we should be using Lists (as users see them) and dicts as its implemented inside the FeatureSet class. We do not have an ordered contract with users. So technically speaking they should be maps/dicts across Feast, but from a user perspective I feel like its cleaner to expose lists, especially when persisting as YAML/JSON.
For labels I am happy to commit to simply a dict()
instead of an OrderedDict.
Other than that the PR looks good @Joostrothweiler
/retest |
Missed this one.. But indeed this also showed I missed some tests for Feature. Made the change and added some tests.
I agree that an empty dict would make more sense. Changed it to initialize it to
Have not made any changes regarding this. If it's something that should be part of this PR I'm happy to make the changes. |
Which raises another good point. We don't have any kind of e2e test coverage for the CLI.
I think we should probably just try to model ourselves after Kubernetes here, although I admit I am not sure how this is handled. I do know that there are various ways to update a resource though.
I think it's out of scope for your PR. The desire is strong though :) Thanks for the PR @Joostrothweiler |
@Joostrothweiler I do think updating the e2e tests to make sure that labels works is important here. There was some coverage, but not for this new API obviously. |
@woop I updated the existing e2e test cases to take the labels directly from the |
@Joostrothweiler Apologies for taking so long to review this. Will try to get to it as soon as possible. |
@woop no problem. Let me know if there's anything I can do |
I will merge this in. We can resolve the issues around Dicts later. Same applies to Field vs Entities/Features. |
/lgtm |
196d357
to
117e727
Compare
117e727
to
b776dff
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Joostrothweiler, woop The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
Adding labels to Python sdk with relevant FeatureSet class functions:
Which issue(s) this PR fixes:
Fixes #663
Does this PR introduce a user-facing change?: