Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write feature API on Serving service #1461

Closed
RoyTal28 opened this issue Apr 13, 2021 · 13 comments
Closed

Write feature API on Serving service #1461

RoyTal28 opened this issue Apr 13, 2021 · 13 comments
Labels
wontfix This will not be worked on

Comments

@RoyTal28
Copy link

Hey guys :),
According to the initial discussion that my colleague created.
I want to elaborate more about the write features API and share the proto file for initial feedback.

We think that the API should basically push feature rows for specific feature tables and projects to the online store in a similar way the ingestion logic works (RedisSinkRelation.scala).

The suggested change to the serving proto file is as follows:

image

Details explanation:
The PushOnlineFeatureRequest contains a definition of the feature table and the project.
And a key-value map containing the values of each row.

**We assume that the feature table and the project (if empty we use default) already exist and validate it.

The PushOnlineFeatureResponse contains a list of RowStatus that represent the pushing status of each row.

I would love to have your feedback so we can move forward and prepare a pull request soon.

Thanks!

@woop
Copy link
Member

woop commented Apr 13, 2021

Hi @RoyTal28,

Thanks for the proposal!

One question that immediately comes to mind: Are users able to push a subset of feature values for a feature table? Feast currently requires that all features in a feature table be provided during ingestion, but we could in the future make it possible to provide a subset. We'd probably need to read the existing row, mutate it, and then write it back.

Thanks again :)

@RoyTal28
Copy link
Author

Hey @woop,
In my current implementation, the user can push several features and not all the defined features.
Should I enforce that all the defined features on the feature table are part of each row on my input?

I am not sure I understand why we would need to read->mutate->write again the row? can u explain?

@woop
Copy link
Member

woop commented Apr 13, 2021

Hey @woop,
In my current implementation, the user can push several features and not all the defined features.
Should I enforce that all the defined features on the feature table are part of each row on my input?

I am not sure I understand why we would need to read->mutate->write again the row? can u explain?

It depends on the way features are being stored. Some storage implementations will store the whole feature row as a single value. Meaning you can only update it by reading it first, otherwise some feature values may be lost if the user doesn't provide the complete row.

Perhaps for a first cut we can just say that the full feature row should be provided, but add support for partial updates in the future?

@RoyTal28
Copy link
Author

Hey @woop,
In my current implementation, the user can push several features and not all the defined features.
Should I enforce that all the defined features on the feature table are part of each row on my input?
I am not sure I understand why we would need to read->mutate->write again the row? can u explain?

It depends on the way features are being stored. Some storage implementations will store the whole feature row as a single value. Meaning you can only update it by reading it first, otherwise some feature values may be lost if the user doesn't provide the complete row.

Perhaps for a first cut we can just say that the full feature row should be provided, but add support for partial updates in the future?

OK, that sounds good.
Regarding the proto file - Can I proceed with my suggestion?

@rmankevich
Copy link

Hi @RoyTal28,

Thanks for the proposal!

One question that immediately comes to mind: Are users able to push a subset of feature values for a feature table? Feast currently requires that all features in a feature table be provided during ingestion, but we could in the future make it possible to provide a subset. We'd probably need to read the existing row, mutate it, and then write it back.

Thanks again :)

Hi,
I just want to clarify current behaviour. I could see in existing code:
feast/ingestion/validation/RowValidator.scala
def allChecks: Column =
allEntitiesPresent && atLeastOneFeatureNotNull && timestampPresent
Do you mean that null is also a value or there is also another place to guarantee all features are provided?

@woop
Copy link
Member

woop commented Apr 15, 2021

Hey @woop,
In my current implementation, the user can push several features and not all the defined features.
Should I enforce that all the defined features on the feature table are part of each row on my input?
I am not sure I understand why we would need to read->mutate->write again the row? can u explain?

It depends on the way features are being stored. Some storage implementations will store the whole feature row as a single value. Meaning you can only update it by reading it first, otherwise some feature values may be lost if the user doesn't provide the complete row.
Perhaps for a first cut we can just say that the full feature row should be provided, but add support for partial updates in the future?

OK, that sounds good.
Regarding the proto file - Can I proceed with my suggestion?

I think it's necessary to share this proposal more widely. Many folks have asked for a write/push API, so I'd like to make sure that people agree on the design. I think it would be better to create a Google document like one of these and share it to [email protected]. It doesn't have to be very long, but it should just allow people to easily comment on the proposed API and understand the functionality that will be introduced.

Overall the design looks good to me, I just want to be sure we're not missing anything. We'd also be happy with sharing it on the mailing list if you don't have time, but we're occupied with a conference (applyconf.com) over the next week.

@woop
Copy link
Member

woop commented Apr 15, 2021

Hi @RoyTal28,
Thanks for the proposal!
One question that immediately comes to mind: Are users able to push a subset of feature values for a feature table? Feast currently requires that all features in a feature table be provided during ingestion, but we could in the future make it possible to provide a subset. We'd probably need to read the existing row, mutate it, and then write it back.
Thanks again :)

Hi,
I just want to clarify current behaviour. I could see in existing code:
feast/ingestion/validation/RowValidator.scala
def allChecks: Column =
allEntitiesPresent && atLeastOneFeatureNotNull && timestampPresent
Do you mean that null is also a value or there is also another place to guarantee all features are provided?

I believe the current guarantees are simply that nulls will overwrite what exists in the store. So if you don't provide all features then you will reset existing values. EDIT: @pyalex has confirmed that the current Spark ingestion does allow for partial updates, but we haven't communicated this to end users.

@RoyTal28
Copy link
Author

RoyTal28 commented Apr 21, 2021

Hey @woop ,
I have created a design document that describes the proto changes and a component diagram.
I have shared it with [email protected] and I hope to get feedback soon.
Thanks!

@woop
Copy link
Member

woop commented Apr 27, 2021

The document has been in circulation now for a few days. If no objection by the end of the week then we can go ahead.

@woop
Copy link
Member

woop commented May 12, 2021

@RoyTal28 @rmankevich @qiuleiSFDC we've left comments in the document. I think we're close to getting this pushed over the line. Would you mind having a look?

@RoyTal28
Copy link
Author

@woop sorry for the big delay.
Our team couldn't find the resources to continue the Push API contribution work.
Now we are starting a new version in which we are planning to contribute features push API and delete API.
A pull request of the push will be created in a few days by my college.

@woop
Copy link
Member

woop commented Aug 17, 2021

Thanks @RoyTal28

@stale
Copy link

stale bot commented Dec 15, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Dec 15, 2021
@stale stale bot closed this as completed Dec 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants