Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend feature set and/or feature metadata #463

Closed
woop opened this issue Feb 6, 2020 · 13 comments · Fixed by #536
Closed

Extend feature set and/or feature metadata #463

woop opened this issue Feb 6, 2020 · 13 comments · Fixed by #536
Labels
area/core good first issue Good for newcomers kind/feature New feature or request
Milestone

Comments

@woop
Copy link
Member

woop commented Feb 6, 2020

This issue tracks the addition of new fields to the current feature set specification that allow a user to add metadata to either the feature set or features. These fields are optional and are intended to provide users with the flexibility to include feature level or feature level information.

The current proposal is to only add a single string field called description to FeatureSpec

@woop woop added area/core good first issue Good for newcomers kind/feature New feature or request labels Feb 6, 2020
@Yanson
Copy link
Contributor

Yanson commented Feb 6, 2020

We would like to integrate Feast with a Data Governance Tool (one of these).

It would be helpful to have additional metadata other than "description". E.g.

  • owner
  • team
  • scope
  • deprecated
  • source
  • sensitive
  • pii-level
  • relationships

How about just adding a metadata field of type map<string, string>?

@woop
Copy link
Member Author

woop commented Feb 7, 2020

We would like to integrate Feast with a Data Governance Tool (one of these).

It would be helpful to have additional metadata other than "description". E.g.

  • owner
  • team
  • scope
  • deprecated
  • source
  • sensitive
  • pii-level
  • relationships

How about just adding a metadata field of type map<string, string>?

Hi @Yanson, thanks for the input!

We'd love to support that use case, in fact @ches and some of our other users have also asked for this. In those discussions the idea was brought up that we could add a label/annotation/tags field (string map) to either a feature set or a feature. That would allow users to add any number of properties to their spec. That sounds very similar to what you are describing above.

The challenge there is not so much in capturing the information, but more in how we expose it. In your use case, were you looking for something like this (and perhaps something similar on a UI based search):

client.list_features(meta={"source":"my-db"})

or how would you end up consuming the meta/labels/annotations?

@Yanson
Copy link
Contributor

Yanson commented Feb 7, 2020

Our basic needs would be that you do list_features() with no args and you get everything with associated metadata. We would have a scheduled process that does this and pushes the results to the Data Governance Tool.

While the ultimate goal is to use the Data Governance Tool for discovery, I have a feeling it will be rather heavyweight so a Feast UI (which itself would require the "search" functionality you describe) would probably provide immediate value to Data Scientists.

One very important reason for using a Feature Store is to share and discover Features, but I don't think search should be the ultimate goal of the Feast project itself. Keep it simple (API only) and create another "app" that can act as a UI, search, rating, discussion tool etc which users can either deploy too or integrate something else as preferred.

@woop
Copy link
Member Author

woop commented Feb 11, 2020

Keep it simple (API only) and create another "app" that can act as a UI, search, rating, discussion tool etc which users can either deploy too or integrate something else as preferred.

Yip, this is what we had in mind as well.

The question is just if the description and metadata should be the same thing or separate. We planned on adding the metadata type field (aka label/annotation/tags) in 0.6, but I am happy to accelerate this if its needed by folks already.

@Yanson
Copy link
Contributor

Yanson commented Feb 11, 2020

The question is just if the description and metadata should be the same thing or separate.

Honestly, don't have much of an opinion. I wouldn't want to see the description "misused" though, if it's the only field (think; custom CSV, JSON content in there).

We planned on adding the metadata type field (aka label/annotation/tags) in 0.6, but I am happy to accelerate this if its needed by folks already.

Not in a desperate hurry. We can contribute if it's that urgent for us.

@tfurmston
Copy link

Personally, I would suggest that they should be separate. If you don't provide people the option to add metadata, but only a description, then I expect people will abuse it.

@woop
Copy link
Member Author

woop commented Feb 12, 2020

Personally, I would suggest that they should be separate. If you don't provide people the option to add metadata, but only a description, then I expect people will abuse it.

I was thinking that we could start with the reverse. Metadata first, and add the description field later. The description field is only valuable above metadata in the case where we want to encourage users to set that specific key, and we perhaps want to print out the contents on a user interface. There is no use case for it right now, but it seems we do have one for metadata.

@tfurmston
Copy link

Yeah, that makes sense to me.

Thanks!

@ches
Copy link
Member

ches commented Feb 12, 2020

I'll try to chime in here shortly since, yes, this topic comes up over and over for us. I will go ahead and cross-reference #363 as a thread that I wanted to find to refer back to on this.

A possible elephant in the room too… This issue title is explicitly "feature set metadata" and we may want to keep it limited to that, but we've touched on some potential use cases for feature-level metadata as well (governance tags and descriptions for humans are both relevant to me, at feature level also). Clearly the complexity of registration might explode with that, but perhaps it's essential complexity, and could be optional.

Especially if feature sets might increasingly be downplayed (for similar reasons that consumers don't want to care about feature sets, they are probably less interesting in a registry browsing UI than entities, features, and projects), perhaps it'd be worth bringing feature-level into scope from the outset of metadata discussion.

@ches ches added this to the v0.5.0 milestone Feb 14, 2020
@woop
Copy link
Member Author

woop commented Feb 15, 2020

I'll try to chime in here shortly since, yes, this topic comes up over and over for us. I will go ahead and cross-reference #363 as a thread that I wanted to find to refer back to on this.

A possible elephant in the room too… This issue title is explicitly "feature set metadata" and we may want to keep it limited to that, but we've touched on some potential use cases for feature-level metadata as well (governance tags and descriptions for humans are both relevant to me, at feature level also). Clearly the complexity of registration might explode with that, but perhaps it's essential complexity, and could be optional.

Agreed on the title being a bit misleading, I will update it to include both in scope. I think the feature level discussion is much more relevant right now.

Especially if feature sets might increasingly be downplayed (for similar reasons that consumers don't want to care about feature sets, they are probably less interesting in a registry browsing UI than entities, features, and projects), perhaps it'd be worth bringing feature-level into scope from the outset of metadata discussion.

I agree on downplaying feature sets here. It seems like we can immediately add value by providing a means of capturing metadata at the feature level.

I want to try and gauge the appetite for including feature level tags/meta in 0.5. @ches @tfurmston @Yanson do we need to spec this out at a higher level with discovery and exploration, or are we comfortable with the addition of a field to feature specs and a means of configuring it, and leaving the higher level APIs to future versions?

So potentially a proposal could be as above

message FeatureSpec {
    string name = 1;
    feast.types.ValueType.Enum value_type = 2;
   // other fields
   map<string, string> labels = 19;
}

with the Python SDK having a set_label(key, value) method and a remove_label(key) method on the Feature class. list_feature_sets() would print out this information as well, but filtering will be left for a future release.

In terms of names, I am open to suggestions. The following have been proposed

  • meta
  • tags
  • labels
  • annotations

My preference is labels, mostly because it mirrors the way that it has been used in Prometheus and Kubernetes.

@woop woop changed the title Extend feature set metadata Extend feature set and/or feature metadata Feb 15, 2020
@Yanson
Copy link
Contributor

Yanson commented Feb 17, 2020

I would rule-out tags because that doesn't sound like key+value.

Kubernetes has labels and annotations, both of which set under metadata.

Labels can be used to select objects and to find collections of objects that satisfy certain conditions. In contrast, annotations are not used to identify and select objects.

Regardless, I am happy with labels with search, filtering, etc left until later.

@tfurmston
Copy link

Completely agree that feature level makes sense. As an end user, that is likely how I would expect to use it.

From my point of view, I don't think we are at a point of a more concrete set of requirements, so happy to leave that for a later point.

@woop
Copy link
Member Author

woop commented Feb 20, 2020

Unless there are any objections, we will implement labels as a map<string, string> at the feature level for 0.5. We will also add basic getters/setters at the feature set level. We will leave the discovery implementation for future releases.

Please vote with a thumbs down if you want to discuss this further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core good first issue Good for newcomers kind/feature New feature or request
Projects
None yet
4 participants