Add MultipleTenancy Support to the FeatureTable #1514

qiuleiSFDC · 2021-04-28T06:37:26Z

Is your feature request related to a problem? Please describe.
As a SaaS company, our infrastructure is entirely multi-tenant to ensure the separation of different customers’ data. We authorize API calls and store data on a per-tenant basis. On the surface, we'd need something like the following in Feast:
(1). Ability to define an optional TenantSpec tenant attribute in FeatureTableSpec that defines the name + type of a tenant field for feature table keys (e.g., name = 'TenantId'; type = 'string')
(2). Adding an optional tenant attribute to ingestion and serving APIs (+SDK methods)
(3). Transparently adding the tenant to storage operations when the tenant attribute is set (e.g., including 'TenantId=foo' in row keys)
More details please refer to discussion in feast-dev/feast-java-old#6.

Describe the solution you'd like
In a high level, for (1) above, we would update the FeatureTable proto as below by adding an optional TenantSpec attribute.

message FeatureTableSpec {
    ...
    // Tenant identifier for data in this feature table.
    TenantSpec tenant = 9;
}

message TenantSpec {
    // Identifier name for tenants (e.g., TenantId). Not updatable.
    string name = 1;

    // Value type of the tenant identifier. Not updatable.
    feast.types.ValueType.Enum value_type = 2;
}

For (2) and (3), we would like to update the feature data retriever and writer API + SDK method accordingly. For example, for retriever, we would add an optional tenant attribute to the retriever method GetOnlineFeaturesRequestV2 as below:

message GetOnlineFeaturesRequestV2 {
    ...
    // Optional tenant to retrieve features for
    feast.types.Value tenant = 6;
}

Similarly, for the writer PushOnlineFeaturesRequest proposed in #1461, we would like to also add a tenant attribute feast.types.Value tenant = 4; as below

message PushOnlineFeaturesRequest {
    // Name of the Feature Table to write the features to.
    string feature_table = 1;
    
    // Optional field to specify project name override.
    string project = 2;
    
    // List of entity and feature values to write
    repeated FieldValues field_values = 3;

     // Optional tenant identifier for values
    feast.types.Value tenant =  4;
    
    message FieldValues {
        // Map of feature or entity name to feature/entity values.
        map<string, feast.types.Value> fields = 1;
    }
}

Please share your thoughts. Thank you.

The text was updated successfully, but these errors were encountered:

woop · 2021-04-28T06:45:59Z

@qiuleiSFDC thanks for the detailed write-up!

Would you mind sharing how tenant would be different from our existing project field? They seem very similar to me.

aaraujo · 2021-04-28T14:15:08Z

The project field is more-analogous to use case whereas the tenant field specifies who owns specific rows in a feature table.

We'd like to use project as use case and not change its associated access control, sharing, discovery, etc. feature semantics. Adding tenant simply means optionally defining a feature table as "able to store data for multiple tenants", and encoding a tenant id in each row. Then we can scope tenant specific read/write/delete operations to only a single tenant's data. This ensures, for example, that tenantA's read/write/delete requests only access or modify that tenant's data.

Treating project as tenant would mean that we lose the semantics and features of project, since you flatten the relationship between project and feature table. And since each feature store use case can store data for multiple tenants, we'd end up with up to numberOfUseCases * numberOfTenants projects and feature tables in the system.

woop · 2021-04-28T17:08:57Z

@aaraujo my primary concern is that this API adds complexity to our APIs. We want Feast to be simple for teams to get started with, which includes a data scientist or engineer just deploying it for a single use case. project already complicates the API, and tenant will do the same, even if it's optional.

What if we had separate "advanced" RPCs that contain functionality as above?

@aaraujo are you imagining that a data scientist would add tenant to their feature views/tables during the creation of these objects?

qiuleiSFDC · 2021-04-28T18:23:01Z

are you imagining that a data scientist would add tenant to their feature views/tables during the creation of these objects?

I think it depends. If the data scientist(DS) from a company with multiple tenant/org data, the DS would add tenant to the view/table during the creation. If not, they could forget the tenant, which assumes all the data would belong to same organization.

rohita5l · 2021-04-28T20:57:08Z

Hi @qiuleiSFDC Im a platform engineer at Tecton and was at Salesforce previously :)
What do you think about making the features have entity as the tenantID. If the feature packages have that defined as an entity then all the reads and writes of the features will need to have the entity field set (in your case with tenantID). This would also provide tenant isolation when doing reads and writes. So when you are doing training/inference and want to retreive features you would only get back the tenant's data you wish via passing it as entity. Let me know your thoughts. @woop Please correct me if Im wrong.

woop · 2021-04-28T21:07:45Z

Yea, your understanding is correct. It seems like entity could work. It's not clear whether there are other concerns like isolation around a tenant entity.

aaraujo · 2021-04-28T21:21:12Z

What if we had separate "advanced" RPCs that contain functionality as above?

We'd need to duplicate: ApplyFeatureTable, GetOnlineFeaturesV2, PushOnlineFeatures, DeleteOnlineFeatures (proposal forthcoming), etc. and related SDK plumbing. Would it be possible/preferable to have a "basic" SDK that only exposes a minimum set of functionality, and an "advanced" one that exposes all of it?

are you imagining that a data scientist would add tenant to their feature views/tables during the creation of these objects?

As @qiuleiSFDC mentioned it depends on the entities/data that will be served. If it's customer owned entities/data, yes a data scientist would create a multi-tenant table/view (specify a TenantSpec when defining the object) and ML applications that use the table/view would then need to specify a tenant in serving requests. If it's internal (company owned) data the table/view would not be defined this way.

What do you think about making the features have entity as the tenantID

We proposed this initially to end users and the consensus was that while this could technically work, the notion of tenant and entity are fundamentally different. The latter describes the type of data stored whereas the former describes ownership. So from a usability perspective defining a tenant "entity" for a feature table did not seem right, and neither did encoding a request scope attribute into every row. For example, if a request comes in with data for tenantA and tenantB how would the service know which tenant is authorized to make the request if the request itself does not have that defined as a top level request attribute?

aaraujo · 2021-04-29T14:02:35Z

@woop @qiuleiSFDC one way we might be able to limit the scope of this change is to accept the tenant as a gRPC Metadata key/attribute in serving requests. FeatureTableSpec.TenantSpec then becomes FeatureTableSpec.TenantMetadataSpec:

message TenantMetadataSpec {
    // gRPC Metadata key for tenant (e.g., TenantId). Not updatable.
    string tenant_key = 1;

    // Value type of the tenant identifier. Not updatable.
    feast.types.ValueType.Enum value_type = 2;
}

With that approach we'd only need to change the ApplyFeatureTable API. At a minimum we can simply indicate that a table is multi-tenant by adding a bool multitenant (or similar) property to FeatureTableSpec and get the tenant "by convention" during serving requests (e.g. pull a String typed TenantId from the gRPC metadata). For our purposes we can get by with a hard-coded 'TenantId' attribute that is a String.

woop · 2021-05-06T19:58:40Z

Would it be possible/preferable to have a "basic" SDK that only exposes a minimum set of functionality, and an "advanced" one that exposes all of it?

We'd like to make our APIs more discoverable. You can imagine us implementing a REST API with swagger like documentation. So ideally whatever the user sees there isn't overwhelming. I like the idea of having a simple/advanced SDK, but we'd need to make sure that the underlying implementations are 99% the same.

@woop @qiuleiSFDC one way we might be able to limit the scope of this change is to accept the tenant as a gRPC Metadata key/attribute in serving requests. FeatureTableSpec.TenantSpec then becomes FeatureTableSpec.TenantMetadataSpec

I like the idea of using gRPC metadata for accepting the tenant.

With that approach we'd only need to change the ApplyFeatureTable API. At a minimum we can simply indicate that a table is multi-tenant by adding a bool multitenant (or similar) property to FeatureTableSpec and get the tenant "by convention" during serving requests (e.g. pull a String typed TenantId from the gRPC metadata). For our purposes we can get by with a hard-coded 'TenantId' attribute that is a String.

What if we simplified things further and captured tenant configuration as part of labels https://github.com/feast-dev/feast/blob/master/protos/feast/core/FeatureTable.proto#L54? This is the approach that Gojek followed in order to add data quality monitoring into Feast in a way that didn't change the core API.

aaraujo-sfdc · 2021-05-06T20:57:16Z

I like the idea of using gRPC metadata for accepting the tenant.

Passing tenant as gRPC metadata would significantly reduce the API blast radius for this feature. We'll explore this a bit more.

What if we simplified things further and captured tenant configuration as part of labels https://github.com/feast-dev/feast/blob/master/protos/feast/core/FeatureTable.proto#L54?

This could also work, but requires adding "special labels" that are well documented and immutable once set. Are we okay with adding such labels?

qiuleiSFDC · 2021-05-06T22:37:26Z

This is the approach that Gojek followed in order to add data quality monitoring into Feast in a way that didn't change the core API.

Is there any open source examples that how people take advantage of the label like this data quality monitoring scenario? Specifically, do the engineers in Gojek use the label mainly in the client side to filter/monitor the data quality? In that sense, I guess the server side(feast service) doesn't care about what labels are out there, and the feast service will just store those labels in the DB? Or does Gojek implement their customized feast API?

What if we simplified things further and captured tenant configuration as part of labels https://github.com/feast-dev/feast/blob/master/protos/feast/core/FeatureTable.proto#L54?

I assume this is doable. But in that sense, I image we would need piece of logic like for reach label {do something} for almost every feast API implementation. Not sure this is ok.?

@aaraujo @woop

woop · 2021-05-11T00:37:02Z

This could also work, but requires adding "special labels" that are well documented and immutable once set. Are we okay with adding such labels?

Yes. We're ok with that.

woop · 2021-05-11T00:50:07Z

This is the approach that Gojek followed in order to add data quality monitoring into Feast in a way that didn't change the core API.

Is there any open source examples that how people take advantage of the label like this data quality monitoring scenario? Specifically, do the engineers in Gojek use the label mainly in the client side to filter/monitor the data quality? In that sense, I guess the server side(feast service) doesn't care about what labels are out there, and the feast service will just store those labels in the DB? Or does Gojek implement their customized feast API?

Have a look over here https://github.com/feast-dev/feast-spark/blob/master/python/feast_spark/contrib/validation/ge.py#L189

Basically they use a label to attach validation logic to a feature table. The label is then available to the streaming ingestion job which applies the validation on batches of data. Feast has an awareness of the label (it's hardcoded on the label name within the Spark ingestion job). However, if the label is not found then no logic is applied. Meaning you have to "opt in" to the functionality by setting the label.

I assume this is doable. But in that sense, I image we would need piece of logic like for reach label {do something} for almost every feast API implementation. Not sure this is ok.?
@aaraujo @woop

Wouldn't it be possible to isolate the tenant specific logic to within the storage implementation? We don't have to support tenant ids in all storage implementations from the start. Or did you have something else in mind?

stale · 2021-09-08T05:10:28Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale bot added the wontfix This will not be worked on label Sep 8, 2021

stale bot closed this as completed Sep 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MultipleTenancy Support to the FeatureTable #1514

Add MultipleTenancy Support to the FeatureTable #1514

qiuleiSFDC commented Apr 28, 2021 •

edited

Loading

woop commented Apr 28, 2021

aaraujo commented Apr 28, 2021

woop commented Apr 28, 2021 •

edited

Loading

qiuleiSFDC commented Apr 28, 2021

rohita5l commented Apr 28, 2021 •

edited

Loading

woop commented Apr 28, 2021 •

edited

Loading

aaraujo commented Apr 28, 2021

aaraujo commented Apr 29, 2021

woop commented May 6, 2021

aaraujo-sfdc commented May 6, 2021

qiuleiSFDC commented May 6, 2021 •

edited

Loading

woop commented May 11, 2021

woop commented May 11, 2021

stale bot commented Sep 8, 2021

Add MultipleTenancy Support to the FeatureTable #1514

Add MultipleTenancy Support to the FeatureTable #1514

Comments

qiuleiSFDC commented Apr 28, 2021 • edited Loading

woop commented Apr 28, 2021

aaraujo commented Apr 28, 2021

woop commented Apr 28, 2021 • edited Loading

qiuleiSFDC commented Apr 28, 2021

rohita5l commented Apr 28, 2021 • edited Loading

woop commented Apr 28, 2021 • edited Loading

aaraujo commented Apr 28, 2021

aaraujo commented Apr 29, 2021

woop commented May 6, 2021

aaraujo-sfdc commented May 6, 2021

qiuleiSFDC commented May 6, 2021 • edited Loading

woop commented May 11, 2021

woop commented May 11, 2021

stale bot commented Sep 8, 2021

qiuleiSFDC commented Apr 28, 2021 •

edited

Loading

woop commented Apr 28, 2021 •

edited

Loading

rohita5l commented Apr 28, 2021 •

edited

Loading

woop commented Apr 28, 2021 •

edited

Loading

qiuleiSFDC commented May 6, 2021 •

edited

Loading