Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security Model #4198

Closed
tokoko opened this issue May 13, 2024 · 58 comments · Fixed by #4380
Closed

Security Model #4198

tokoko opened this issue May 13, 2024 · 58 comments · Fixed by #4380
Labels
kind/feature New feature or request

Comments

@tokoko
Copy link
Collaborator

tokoko commented May 13, 2024

Is your feature request related to a problem? Please describe.
This is an offshoot ticket from #4032. Feast is slowly approaching a state in which all major components can be deployed as remote servers. This enables us to start thinking about a comprehensive security model for access to each of them (registry, online store, offline store)

Describe the solution you'd like
Here's a high-level overview of what I'm expecting from the security model:

  1. I'd avoid incorporating user management into feast as much as possible. We should probably have a pluggable authentication module (LDAP, OIDC, etc...) that takes user/password (or token), validates it and spits out the roles that have been assigned to this particular user. Each server will have to integrate with this module separately, http feature server will get user/pass from basic auth, grpc and flight will get them according to their own standard conventions and pass credentials to the module to get the list of assigned roles. (Somewhat inspired by Airflow security model)

  2. (Option 1) We enrich Feature Registry to also contain information about the roles available in the system and each feast object should be annotated with permissions. In other words, the user would run feast apply with something like this

admin_role = FeastUserRole(name='admin')
reader_role = FeastUserRole(name='reader')

FeatureView(
	name=...
	schema=...
	...
	permissions={
		'read': [role],
		'write': [admin_role] 
	}
)
  1. (Option 2) Another option is to try to mimic AWS IAM and brush up on our regexes. In this case instead of annotating objects with permissions, you're annotating roles with policies.
risk_role = FeastUserRole(
	name='team_risk_role',
	permissions=[
		FeastPermission(
			action='read', //read_online, read_offline, write_online, write_offline
			conditions=[{
				'name': 'very_important_*',
				'team': 'risk'
			}]
		)
	]
)

FeatureView(
	name='very_important_fv',
	schema=...
	...
	tags={
		'team': 'risk'
	}
)

The upside of the second approach is that it's a lot less invasive than the first one. You could potentially end up with a setup where permissions and objects are managed with some level of separation between them. I think I'm more in favor of this.

  1. Once a server gets ahold of user roles and permission information from the registry, all components will apply the same "rules engine" to authorize the requested action.
@tokoko tokoko added the kind/feature New feature or request label May 13, 2024
@dmartinol
Copy link
Contributor

@tokoko we started thinking about a possible solution, that we can share after completing the internal review, but first we'd like to ask a few questions to verify our understanding of the initial requirements (*):

  • We assume that the protected resources are all instances of FeatureView (including OnDemandFeatureView), FeatureService and theFeatureStore, correct?
  • Is the intention to also prevent unauthorized accesses from clients using the Feast SDK, instead of the servers? The reason for asking is that we are also thinking to add similar requirement to prevent undesired updates to the data stores from non-admin personnel. However, we are aware that this may be a corner case scenario in most production deployments, as the feature store definition (e.g. the feature_repo.yaml used to feast apply) should not be accessible to all the users (e.g., in a RHOAI setup, it should live in a separate "admin notebook" or in a well-protected git repo and branch), and maybe it's not a real concern at all.
  • On the implementation side, are you thinking to leverage any existing library like PyCasbin -that may also look overkill for now- or a simplified in-house solution implementing the security model? (personally, we'd avoid this option to avoid any vendor lock-in)
  • Should we also add a feast feast-permissions list [--verbose] to list the existing permissions together (in verbose mode) with the matching resources? (and maybe list the unprotected resources, which can help troubleshooting permission errors)

(*) We also like option 2 for the reasons you mentioned above. Additionally, we can share a reviewed definition to better align with the usual RBAC permission models coming from our previous work with Keycloak permission features.

@tokoko
Copy link
Collaborator Author

tokoko commented May 15, 2024

  • We assume that the protected resources are all instances of FeatureView (including OnDemandFeatureView), FeatureService and theFeatureStore, correct?

I'm not sure I'll be able to list everything comprehensively here, but I think there're two major sets of permissions (and protected resources as a result) to consider here. The first part is CRUD-like permissions for manipulating objects in feast registry: Entity, DataSource, FeatureView, StreamFeatureView, OnDemandFeatureView, FeatureService, SavedDataset, ValidationReference. (TAL at registry server proto for reference) These will probably need can_read and can_edit actions (But note that read here refers to reading object definition, not the data).

Another aspect is managing access to the underlying data. FeatureService and all variations of FeatureViews should probably have can_query action. DataSource and SavedDataset will require can_query and can_write actions and so on. I'm sure I'll miss something here.

  • Is the intention to also prevent unauthorized accesses from clients using the Feast SDK, instead of the servers? The reason for asking is that we are also thinking to add similar requirement to prevent undesired updates to the data stores from non-admin personnel. However, we are aware that this may be a corner case scenario in most production deployments, as the feature store definition (e.g. the feature_repo.yaml used to feast apply) should not be accessible to all the users (e.g., in a RHOAI setup, it should live in a separate "admin notebook" or in a well-protected git repo and branch), and maybe it's not a real concern at all.

You mean SDK usage without setting individual components as remote, right? no, I don't think that's the intention simply because that would be way too hard to accomplish. In such a case a client application needs direct access to the underlying resources, so we would have to somehow inject ourselves in that, also considering that different implementations of components will have completely different permissions. tbh, I don't think that's even possible. I think we should enforce permissions only on the servers. If someone has access to the underlying resources itself and configures feature_store.yaml with necessary credentials, they will be able to circumvent the security model and I think it might be completely fine for some use cases, for example materialization may be something that's orchestrated by a central ML Platform team only that doesn't really need to care about permissions.

  • On the implementation side, are you thinking to leverage any existing library like PyCasbin -that may also look overkill for now- or a simplified in-house solution implementing the security model? (personally, we'd avoid this option to avoid any vendor lock-in)

I'm with you on that one. I think we should start by agreeing on some sort of FeastSecurityManager abstract interface (a class with a method that takes user/pass and outputs a list of roles for example) w/o any external dependency. We could then use some off-the-shelf functionality for each auth method implementation.

  • Should we also add a feast feast-permissions list [--verbose] to list the existing permissions together (in verbose mode) with the matching resources? (and maybe list the unprotected resources, which can help troubleshooting permission errors)

Maybe, I'm not sure what that would look like though. Do you mean listing defined roles or permissions that can be specified in those roles?

(*) We also like option 2 for the reasons you mentioned above. Additionally, we can share a reviewed definition to better align with the usual RBAC permission models coming from our previous work with Keycloak permission features.

Cool, we can start there then.

@tokoko
Copy link
Collaborator Author

tokoko commented May 15, 2024

@dmartinol Sorry, I just took a look at pycasbin. I guess it's a rules engine, so disregard my answer above. It looks promising, but I'm fine with home-grown option as well, depends on how complicated our version will be to maintain I guess.

@dmartinol
Copy link
Contributor

Maybe, I'm not sure what that would look like though. Do you mean listing defined roles or permissions that can be specified in those roles?

Yep, something like

feast feast-permissions list --verbose
permissions
├── feast-admin [feast-admin]
│   └── FeatureStore
├── read-online-stores [role1, role2]
│   ├── FeatureService:fs1
│   ├── FeatureView:fv1
│   └── FeatureView:fv2
└── write-offline-stores [writer]
    └── FeatureView:fv3

@dmartinol
Copy link
Contributor

...manipulating objects in feast registry: Entity, DataSource, FeatureView, StreamFeatureView, OnDemandFeatureView, FeatureService, SavedDataset, ValidationReference. (TAL at registry server proto for reference)

So you mean the FeastObjectType enum (to be extended to support the map also SavedDataset and ValidationReference types).

BTW: what about the Registry type instead? e.g., how can we model the permissions to execute feast apply otherwise?

@tokoko
Copy link
Collaborator Author

tokoko commented May 15, 2024

Yes, that sounds about right. not sure what you mean about Registry type, can you elaborate? When a user runs feast apply it almost exclusively boils down to crud operations on the above mentioned list of resources applied to the registry. So those are the protected resources, Registry type is just an interface where crud of these objects are applied from. Maybe I'm missing something?

@dmartinol
Copy link
Contributor

Yes, that sounds about right. not sure what you mean about Registry type, can you elaborate? When a user runs feast apply it almost exclusively boils down to crud operations on the above mentioned list of resources applied to the registry. So those are the protected resources, Registry type is just an interface where crud of these objects are applied from. Maybe I'm missing something?

🤔 yes, seeing it from this perspective, this is fine. So, for completeness we probably need all the CRUD actions like:
create, read, update, delete plus query, query_online, query_offline and write, write_online, write_offline

@tokoko
Copy link
Collaborator Author

tokoko commented May 15, 2024

I'm undecided between having create, update, delete vs a single edit action.

@dmartinol
Copy link
Contributor

dmartinol commented May 16, 2024

@tokoko on the implementation side, do you think a programmatic solution is mandatory (e.g., like the PyCasbin enforcer), or can we avoid changing the code and instead use decorators to enforce permission policies?
BTW, in my opinion, we cannot use decorators because some affected functions manipulate multiple protected resources (e.g., FeatureViews) at the same time. Additionally, the code may not be structured to support such granular configuration at the individual API level. However, I'd like to hear your feedback on this.
+@tmihalac who raised the question

@tokoko
Copy link
Collaborator Author

tokoko commented May 16, 2024

@dmartinol @tmihalac I think the most logical points where permission enforcement should happen is in the methods of the major feast abstract classes (OfflineStore, OnlineStore, BaseRegistry). I think for the registry where CRUD-like operations live, granularity shouldn't be a problem because of how BaseRegistry is designed. For OnlineStore and OfflineStore, yes, sometimes you might get a request for multiple feature views at once, but I don't really see why that would be a problem for a decorator, tbh. decorators are just function preprocessors, right? Maybe I'm missing something, but I don't really see the difference between those options other than that decorators will probably look better...

@dmartinol
Copy link
Contributor

@tokoko we want to share with you a gist describing a proposal to implement this functionality.

The modelling part follows your initial requirement but tries to adapt it to some standards that we've found in Keycloak.
Apart from that, we also propose a possible architecture of the security management solution and some example of usage in the Feast code (both programmatic and decporation options).

@tokoko
Copy link
Collaborator Author

tokoko commented May 17, 2024

@dmartinol Can you clarify what's RoleBasedPolicy exactly for me? Looks like it's a list of roles (extracted from keycloack for example) that have this permission assigned to them. If that's the case:

  • I'm not sure I like the name 😄 Can't this just be a roles parameter that takes a list of strings?
  • Is it a good idea to have a list of roles scattered around with permission objects? Wouldn't it be better to have another FeastRole(name: str, permissions:List[FeastPermission]) resource? The downside is introducing another resource type that needs to be managed, of course.. just a suggestion. wdyt?

@tokoko
Copy link
Collaborator Author

tokoko commented May 17, 2024

Also, what's add_roles_for_user method in RoleManager? Does that mean there should be a way to assign roles to the user from feast itself or maybe I'm misunderstanding the class? If we have auth backends like LDAP or OIDC, I was thinking we could delegate role management to them fully, so that assigning roles to the user would happen in AD, Keycloak or somewhere like that.

To me something like this makes more sense instead of RoleManager:

class AuthManager:
    """auth management"""

    def authenticate(self, user: str, password: str) -> List[str]:
        """
        Returns a list of roles if authentication successful, empty if auth failed.
        """
        return False

And then we would have concrete classes like LdapAuthManager, OidcAuthManager and so on.

@dmartinol
Copy link
Contributor

@dmartinol Can you clarify what's RoleBasedPolicy exactly for me? Looks like it's a list of roles (extracted from keycloack for example) that have this permission assigned to them. If that's the case:

  • I'm not sure I like the name 😄 Can't this just be a roles parameter that takes a list of strings?

Quoting Keycloak docs, policies define the conditions that must be satisfied before granting access to an object, and we could have policies based on different criteria, e.g. (also speaking Keyclock-ish):

  • User-based policy: match the configured user against the user in the client request
  • Role-based policy: match the configured roles against the roles of the user in the client request
  • Attribute-based policy: ...

So, the reason for having RoleBasedPolicy was to make room for introducing a Policy interface that can be added later with a type field (one of role, user) or even with an enforce method to apply the policy verification.
These policy entities can be shared by multiple permissions, if it is the (likely) case:

read_policy = RoleBasedPolicy(['reader', 'viewer'])
admin_policy = RoleBasedPolicy(['admin'])

permission1 = FeastPermission(...,policies=[read_policy, admin_policy],...)
permission2 = FeastPermission(...,policies=[read_policy],...)
permission3 = FeastPermission(...,policies=[admin_policy],...)
  • Is it a good idea to have a list of roles scattered around with permission objects?

It is not a a list of roles scattered around with permission objects, but a list of policies to be applied to allow the execution of the protected actions 😉.

Wouldn't it be better to have another FeastRole(name: str, permissions:List[FeastPermission]) resource? The downside is introducing another resource type that needs to be managed, of course.. just a suggestion. wdyt?

The model that you are proposing describes the permissions allowed for any given role instead of the roles/policies allowed for any given permission. Since the cardinality is N-N, it probably doesn't change much, but please consider that roles are not policies, and in the future we could extend the concept and manage policies that are not based on roles.

@jeremyary do you have any modelling preferences here?

@dmartinol
Copy link
Contributor

Also, what's add_roles_for_user method in RoleManager? Does that mean there should be a way to assign roles to the user from feast itself or maybe I'm misunderstanding the class? If we have auth backends like LDAP or OIDC, I was thinking we could delegate role management to them fully, so that assigning roles to the user would happen in AD, Keycloak or somewhere like that.
...
And then we would have concrete classes like LdapAuthManager, OidcAuthManager and so on.

The idea here is that another entity like the AuthManager that you define next, takes care of handling auth & authzn and populate the roles in the RoleManager, so the policy enforcement can use this information w/o being impacted by the actual auth provider.
But yes, probably this flow can be reviewed a bit and surely a POC can help to clarify the design.

Maybe LdapAuthManager, OidcAuthManager are not so relevant in the K8s context, where there is a service-2-service interaction and there are no external users to be authenticated? Of course, if we define an open interface, anyone can contribute his own implementation.

@tokoko
Copy link
Collaborator Author

tokoko commented May 17, 2024

@dmartinol thanks, that makes sense and looks like keycloak uses those terms so mustn't confusing for everyone. I'm probably influenced by AWS terms where policies and permissions aren't really separated in any way, policies are more or less just a collection of permissions. My only concern is that if we end up sticking with just RoleBasedPolicy, it will just be a slightly complicated way of providing a list of strings, but we can worry about that later...

@tokoko
Copy link
Collaborator Author

tokoko commented May 17, 2024

Maybe LdapAuthManager, OidcAuthManager are not so relevant in the K8s context, where there is a service-2-service interaction and there are no external users to be authenticated? Of course, if we define an open interface, anyone can contribute his own implementation.

I may be the one missing some context here, sure. The way I think about it when a user authenticates in some way, auth provider is also usually the one that can provide additional info regarding roles and attributes. mlflow has something similar where authenticate call if successful returns Authorization object that contains all the relevant information. For ldap and oidc, a separate role manager seems odd, but I'm not sure how some k8s service mesh magic changes this flow. What's the source of user-related information in that case? Is it possible to make RoleManager an internal implementation detail of K8sAuthProvider(?) instead of an external interface? just an idea...

@franciscojavierarceo
Copy link
Member

I'm undecided between having create, update, delete vs a single edit action.

@tokoko I would suggest each, as the flexibility will be great for users

@franciscojavierarceo
Copy link
Member

I am also pro Option 2 at the moment

@dmartinol
Copy link
Contributor

@tokoko we prepared a POC to prove our initial design.
Please take a look and share your feedback 👍

@franciscojavierarceo
Copy link
Member

@tokoko we prepared a POC to prove our initial design. Please take a look and share your feedback 👍

Very nice!

@tokoko
Copy link
Collaborator Author

tokoko commented May 29, 2024

@dmartinol Looks really good overall, thanks. Obviously still going through it, but I'll leave remarks here along the way:

  • KeycloakAuthManager should probably be called OidcAuthManager, right? Is there anything specific to keycloak in the implementation?
  • I'm not sure if something like require_permissions decorator is applicable in our scenario. If I'm reading it correctly, methods need to be attached to some Resource type to be marked with it. Is that possible in feast? Our resources (FeatureView for example) don't have protected methods associated with them directly, in other words there's no query_online or query_offline methods in class FeatureView. Instead policy enforcement should happen at the start of methods like get_historical_features which aren't directly attached to protected resources.

@dmartinol
Copy link
Contributor

  • KeycloakAuthManager should probably be called OidcAuthManager, right? Is there anything specific to keycloak in the implementation?

Agree, there are no keycloak-specific details, and the required URLs can be retrieved from the discovery URL like http://localhost:8080/realms/poc/.well-known/openid-configuration that we can add to the external configuration

@dmartinol
Copy link
Contributor

  • I'm not sure if something like require_permissions decorator is applicable in our scenario. If I'm reading it correctly, methods need to be attached to some Resource type to be marked with it. Is that possible in feast? Our resources (FeatureView for example) don't have protected methods associated with them directly, in other words there's no query_online or query_offline methods in class FeatureView. Instead policy enforcement should happen at the start of methods like get_historical_features which aren't directly attached to protected resources.

This was also my initial concern (see previous comment, and you suggested to use decorators, which instead may be difficult to apply to

the start of methods like get_historical_features

because the decorator, IIUC, needs to identify the input arguments containing the protected resources (e.g. features in FeatureStore.get_historical_features), in order to hide the authorization process and apply the permission rules.

For this reason, we can adopt a programmatic approach (mentioned in the POC) at all identified entry points, allowing us to apply precise validation, such as:

   a : ResourceA = ...
   _get_security_manager().assert_permissions(a, AuthzedAction.EDIT)

I've updated the POC with an Orchestrator class that runs the permission checks via API, please have a look.

BTW, this approach works if we agree on the above assumption that we:

enforce permissions only on the servers

We are aware that client code can bypass these start methods and directly access the inner objects; however, this is outside the scope of our investigation.

@tokoko
Copy link
Collaborator Author

tokoko commented May 30, 2024

I guess I was referring to a more complicated decorator, something that would know how to inspect values passed in features param in get_historical_features for example, but you're right, we should probably abandon that approach. Orchestrator looks good to me. We will probably also need some sort of logic in all the places where permission enforcement happens, so that it's skipped when a user hasn't configured auth (usually on client-side).

@dmartinol
Copy link
Contributor

We will probably also need some sort of logic in all the places where permission enforcement happens, so that it's skipped when a user hasn't configured auth (usually on client-side).

There's a DefaultSeurityManager for that, with:

    def assert_permissions(
        self,
        resource: Resource,
        actions: Union[AuthzedAction, List[AuthzedAction]],
    ):
        return True

Of course, it's a POC, and we can elaborate on it further

@tokoko
Copy link
Collaborator Author

tokoko commented May 30, 2024

thanks, overlooked that part. Another question, I understand that role manager and policy enforcer should be global objects, but isn't SecurityManager being global a problem when it contains user context specific info (current user)?

@dmartinol
Copy link
Contributor

On a somewhat related note, I think we have to decide how to handle the case in an offline store get_historical_features call when the user passes entity_df as a sql string. It's unlikely we would be able to extend security model to support passing arbitrary sql strings like that. We should either deprecate the whole thing (as it was planned before #1688) or disallow that use case in remote offline server when security is enabled. Not an urgent matter, just something that popped into my head...

Agree, we'll write a test to verify that this use case is denied for secured deployments

@tokoko
Copy link
Collaborator Author

tokoko commented Jun 24, 2024

@dmartinol I'm fine with either approach. If you think some of the design needs more work, we can start there and integrate later.

In the meantime, another late bright idea from me.. 😄 What do you think about putting policy enforcement logic behind registry as well? We could implement it under a new check_permissions method in BaseRegistry class and add similar endpoints in RegistryServer as well. It feels a little out of place, but the main upside would be that prospective non-python servers wouldn't need to reimplement the logic.

@dmartinol
Copy link
Contributor

@tokoko we're starting to create an initial PR with the agreed model and an implementation for the existing servers.

wrt to moving the validation API to the BaseServer, exposed by the related endpoint, I agree with having an endpoint to expose the managed permissions (list_permissions) but, given the usual proxy pattern adopted by the client application, the validation endpoint is probably not be needed, it should work transparently as usually.
We can discuss this point once we share the PR, I don't think it is really critical, but maybe we'll find it useful.
I imagine the security manager will be initialized as follows, so it should not matter whether the registry is local or remote:

security_manager.permissions = registry.list_permissions() # proxy for remote endpoint, if needed

@dmartinol
Copy link
Contributor

dmartinol commented Jun 24, 2024

@tokoko,
@tmihalac raised a great point today, e.g. the 🐔 and 🥚 concern: what permission do we need to access and manage permission instances?

Assuming that the store admin creates the project with feast apply from a notebook/CLI, so outside of the managed authorization context (every operation is allowed to any user at that time!), the data scientists will be provided with new CRUD endpoints in the registry server to manage the permissions, which instead need to be validated against the user credentials and roles.
I believe we need to add the Permission type as one of the managed data types in FeastObject, WDYT?
(BTW: we'll be using FeastObject as the type to identify secured resources, we've reviewed and simplified the initial design proposed some weeks ago)

@redhatHameed

@dmartinol
Copy link
Contributor

@tokoko a couple of points to clarify with you that were raised during our internal discussions:
1- We initially proposed a design with multiple policies associated to a given Permission, which is a very flexible solution (emulating Keycloak permission model) but maybe it's not addressing any real use case. do you think multiple policies are actually needed?
Maybe the following:

    Permission(
        name="read-from-any-A",
        resources=[FeatureView, FeatureService],
        policies=[RoleBasedPolicy(roles=["a-reader"]), RoleBasedPolicy(roles=["any-reader"])],
        actions=[AuthzedAction.READ],
    )

could be simplified as:

    Permission(
        name="read-any",
        resources=[FeatureView, FeatureService],
        policy=RoleBasedPolicy(roles=["a-reader", "any-reader"]),
        actions=[AuthzedAction.READ],
    )

2- Second, also related to the previous, do you think the store admins want to have reusable Policies shared amongst different Permissions? e.g., do we need to have independent policies with their own (unique) name and manage them as independent Feast resources?

    p = RoleBasedPolicy(roles=["a-reader", "any-reader"])
    Permission(
        name="read-any",
        resources=[FeatureView, FeatureService],
        policy=p,
        actions=[AuthzedAction.READ],
    )
    Permission(
        name="read-data",
        resources=[DataSource],
        policy=p,
        actions=[AuthzedAction.READ],
    )

BTW: my personal answers are 1-single policy and 2-no reuse but embed in the Permission model

@tmihalac @redhatHameed

@tokoko
Copy link
Collaborator Author

tokoko commented Jun 24, 2024

Assuming that the store admin creates the project with feast apply from a notebook/CLI, so outside of the managed authorization context (every operation is allowed to any user at that time!), the data scientists will be provided with new CRUD endpoints in the registry server to manage the permissions, which instead need to be validated against the user credentials and roles. I believe we need to add the Permission type as one of the managed data types in FeastObject, WDYT? (BTW: we'll be using FeastObject as the type to identify secured resources, we've reviewed and simplified the initial design proposed some weeks ago)

Yeah, since Permissions will be managed similar to other feast objects, permissions for permissions will have to be handled accordingly. Some sort of admin permission will be the escape hatch for the chicken and egg problem.

do you think multiple policies are actually needed?

I think I suggested turning it into a roles: list[str] param before, so I'm on board with this one, a single policy is enough.😄

do we need to have independent policies with their own (unique) name and manage them as independent Feast resources?

The only upside is a protection against typos here, so +1 for embedded as well, one feast resource (Permission) to manage is simpler.

I imagine the security manager will be initialized as follows, so it should not matter whether the registry is local or remote:

My point is that if permission check logic is part of security manager rather than the registry, other feature servers (go/java) will have to reimplement it as security manager class won't be reusable across languages. Still, It's not yet critically important, I agree. Let's implement it first and reassess how much of a problem that will be after.

@dmartinol
Copy link
Contributor

@tokoko about the actual validation logic, you wrote

I think the most logical points where permission enforcement should happen is in the methods of the major feast abstract classes (OfflineStore, OnlineStore, BaseRegistry)

Since most of these methods (e.g. in BaseRegistry) are abstract, you mean to apply some refactory like renaming the abstract methods and applying the validation logic in all the external APIs, either before or after the execution of the abstract methods (*)?

[BaseRegistry]
    # Remove abstractmethod
    def apply_entity(self, entity: Entity, project: str, commit: bool = True):
        _assert_permissions(resource=entity, actions=[AuthzedAction.CREATE, AuthzedAction.UPDATE, AuthzedAction.DELETE])
        _apply_entity(self, entity: Entity, project: str, commit: bool = True):

    # Renamed abstract method
    @abstractmethod
    _def apply_entity(self, entity: Entity, project: str, commit: bool = True):
        raise NotImplementedError

(*) In case it happens after, we must guarantee that the execution had no unauthorized side effects on the data stores.

As an alternative, we could invent a new decorator where you can specify when to run (BEFORE or AFTER the actual method) and on what parameter to apply, but even this solution needs a new abstract method to be introduced, as both decorator cannot cohexist AFAIK (and tested 😞):

[BaseRegistry]
    # 'what' is the input arg to validate against permissions
    @require_permission(when=BEFORE, what="entity", actions=[AuthzedAction.CREATE, AuthzedAction.UPDATE, AuthzedAction.DELETE])
    def apply_entity(self, entity: Entity, project: str, commit: bool = True):
        _apply_entity(self, entity: Entity, project: str, commit: bool = True):

    @abstractmethod
    _def apply_entity(self, entity: Entity, project: str, commit: bool = True):
        raise NotImplementedError

    # 'what' not needed for validation AFTER the execution, it applies on the returned value
    @require_permission(when=AFTER, actions=[AuthzedAction.CREATE, AuthzedAction.UPDATE, AuthzedAction.DELETE])
    def get_entity(self, name: str, project: str, allow_cache: bool = False) -> Entity:
        return self._get_entity(name=name, project=project, allow_cache = allow_cache)

    @abstractmethod
    def _get_entity(self, name: str, project: str, allow_cache: bool = False) -> Entity:
        raise NotImplementedError

In case the decorator cannot apply, we can use the previous programmatic option.

CC: @redhatHameed

@tokoko
Copy link
Collaborator Author

tokoko commented Jun 25, 2024

I did something similar for CachingRegistry class previously. Didn't use decorators there, but had to introduce new abstract methods. The only reason I did that in a new class was to avoid changes in the abstract one. Unfortunately, I don't think we can do the same here once again as two wrapper classes won't really get along with one another. Maybe we can do it in BaseRegistry this time and eventually fold CachingRegistry in there as well? wdyt?

I like the decorator design here, but keep in mind that simple BEFORE/AFTER approach might not cut it for OnlineStore/OfflineStore classes. Over there you might have to get FeatureViews first, check tags, etc and decide whether to keep processing the request somewhere in the middle.

@dmartinol
Copy link
Contributor

I did something similar for CachingRegistry class previously. Didn't use decorators there, but had to introduce new abstract methods. The only reason I did that in a new class was to avoid changes in the abstract one. Unfortunately, I don't think we can do the same here once again as two wrapper classes won't really get along with one another. Maybe we can do it in BaseRegistry this time and eventually fold CachingRegistry in there as well? wdyt?

IIUC, we can move all the abstract methods from CachingRegistry down to BaseRegistry, decorate the methods in the abstract class (if we decide for decorators, but it should not change with progr approach), and let the caching class invoke the super method instead:

return super().get_data_source(name, project)

This should allow to execute the pipeline of caching logic -> validation check -> actual method

@dmartinol
Copy link
Contributor

I like the decorator design here, but keep in mind that simple BEFORE/AFTER approach might not cut it for OnlineStore/OfflineStore classes. Over there you might have to get FeatureViews first, check tags, etc and decide whether to keep processing the request somewhere in the middle.

Decorators can only implement validation checks before or after the actual implementation, so on method arguments or return values: if the validation process must happen in the middle of it, the programmatic approach is the only solution I can see. As you said, it's probably the case of most methods of the FeatureStore

@tokoko
Copy link
Collaborator Author

tokoko commented Jun 25, 2024

Let me do a complete 180 for a moment. Maybe the best place for enforcement are server endpoints after all.. Other than the simplicity of the implementation, I'm thinking of a scenario when a user has query permissions on FeatureService A, but doesn't have a read permission on a FeatureView B which is part of FeatureService A. If we try to enforce permissions on every single Registry/Online/Offline call made on the server, we'll end up disallowing the server to read the feature view on behalf of the client even though feature view itself isn't what the client is asking for... If we instead enforce permissions on server endpoints (grpc registry server, http/grpc feature servers, flight server), the server would be free to complete a request.

@dmartinol
Copy link
Contributor

1- about your proposal, isn't the same as saying we need to restrict the permissions at the methods exposed by server endpoints and validate only the input parameters (or those that are directly derived from them, coming from the server request)? the advantage is to apply it once for all the servers, for each exposed API, and avoid changing the server code in case they're not holding a reference of the protected resource.

2- let me do a 180 myself (it's viral 😉): is the requirement to secure individual instances with different permissions coming from the field or is it a dev's proposal?
The solution we're designing offers a fine granular configuration, but may also lead to complex setup where the admins need to ensure that every resource is protected by permissions for all the managed operations. The security configuration would imply a consistent review of the feature store configuration (*.py) and there would be no security enforced OOTB: each team has to define his own permissions.

If we're looking for simplicity, wouldn't be much easier to protect the endpoints with system-defined roles instead?

    @roles_required('offline_store_query', message="cannot query the offline store")
    def get_historical_features(
...

The only configuration needed for the admins would be to assign roles to the users (maybe the "system-roles" could be customized further to isolate the server instances with different roles), with the drawback of not configuring security on individual instances.

@tokoko
Copy link
Collaborator Author

tokoko commented Jun 25, 2024

the advantage is to apply it once for all the servers

I'm not sure this is much of an advantage in reality, though. Most of our servers reflect abstract class methods one to one anyway (in order to provide remote implementations). We'd essentially have to write almost exactly the same amount of code.

is the requirement to secure individual instances with different permissions coming from the field or is it a dev's proposal?

It is a dev's proposal 😄 but I'd like to think it's an informed one. I'm basing the need for granular control on the fact that very often feast deployment 1) will be centralized across teams and 2) feast feature views will contain sensitive information. From security perspective, that makes feature store equivalent to a data warehouse of sorts. Having an all or nothing approach there seems inadequate.

@dmartinol
Copy link
Contributor

@tokoko FYI @redhatHameed is validating the approach to enforce permissions on the servers, as per the updated requirement.

Thanks for clarifying your point of view, we'll probably share an initial PR with model changes and an implementation on a selected server in the next few days/early next week.

@tokoko
Copy link
Collaborator Author

tokoko commented Jun 26, 2024

@dmartinol @redhatHameed sounds good. btw, to continue the above discussion regarding sometimes having to check permissions in the middle of execution rather than before/after. I think in practice it would make sense to try to fully decouple that process from the execution even if it means that some of the objects will need to be fetched twice from the registry (once during permission check by server code and again during execution by provider code). In most scenarios the whole registry object will be cached on the server so I think performance penalty incurred should be negligible.

@dmartinol
Copy link
Contributor

I think in practice it would make sense to try to fully decouple that process from the execution even if it means that some of the objects will need to be fetched twice from the registry

This is exactly the way we are experimenting it right now.

@dmartinol
Copy link
Contributor

@tokoko one more doubt about how to validate the user roles.

If a permission is defined with roles "reader" and "writer", does it mean that any of them are required for the current user, or both are instead required?
I was thinking to use "any" rather than "all", which is also the default for Keycloak (which also has the concept of "mandatory" role, where all the mandatory roles must be associated to the user), any thoughts?

@tokoko
Copy link
Collaborator Author

tokoko commented Jun 26, 2024

I assumed "any" as well, seems more practical.

@dmartinol
Copy link
Contributor

(oops, this thread never ends)
@tokoko to simplify the client's life and adopt the usual proxy pattern for remote clients, shouldn't we also extend the remote clients configuration to specify the authorization server used to fetch the access token?
Otherwise, there would be no way to connect a secured server from a remote client, devs should invoke the REST/grpc services with their own utility code.

Follow two possible example for the k8s and oidc authorization servers:

offline_store:
    type: remote
    host: localhost
    port: 8815
    auth:
        type: kubernetes
offline_store:
    type: remote
    host: localhost
    port: 8815
    auth:
        type: oidc
        server: 'http://0.0.0.0:8080'
        realm: 'poc'
        client-id: 'app'
        client-secret: 'mqAzX7zDalQ1a3BZRWs7Pi5JRqCq7h4z'
        username: 'username'
        password: 'password'

@redhatHameed @tmihalac

@tokoko
Copy link
Collaborator Author

tokoko commented Jun 26, 2024

more the merrier. Yeah, once servers are secured, we will have to enable remote implementations to talk with secured servers and consequently add whatever configurations are necessary. The first naive implementation might mean the user will have to configure auth info separately for each component.

P.S. This probably wasn't the point, but I'm not sure I follow your oidc example, most of these configs seem like stuff that should be configured for the server, not the client. I'm not sure what's the best practice for setting up a non-interactive programmatic oidc session, maybe I'm missing something.

@dmartinol
Copy link
Contributor

The first naive implementation might mean the user will have to configure auth info separately for each component.

What I mean is that the client app has to pass authentication token to the server in the http header (also for grpc servers) and since the header management is not or grpc endpoints directly.
The workaround exists, but it’s a bit expensive 😬

@dmartinol
Copy link
Contributor

P.S. This probably wasn't the point, but I'm not sure I follow your oidc example, most of these configs seem like stuff that should be configured for the server, not the client

We’re not implementing authentication in Feast, right? So the client has to request the token to the Oidc server and it needs all these settings for that. The server, in turn, needs some of these settings to extract the user roles from the token.

@redhatHameed
Copy link
Contributor

@dmartinol @redhatHameed sounds good. btw, to continue the above discussion regarding sometimes having to check permissions in the middle of execution rather than before/after. I think in practice it would make sense to try to fully decouple that process from the execution even if it means that some of the objects will need to be fetched twice from the registry (once during permission check by server code and again during execution by provider code). In most scenarios the whole registry object will be cached on the server so I think performance penalty incurred should be negligible.

@tokoko opened PR that asserts/checks permissions on remote servers (offline, online, registry) by utilizing the permission/security model framework (still work in progress). Just sharing this with you to get initial feedback and to ensure we are moving in the right direction.

@tokoko
Copy link
Collaborator Author

tokoko commented Jun 28, 2024

We’re not implementing authentication in Feast, right? So the client has to request the token to the Oidc server and it needs all these settings for that. The server, in turn, needs some of these settings to extract the user roles from the token.

@dmartinol Sure, never mind. I thought the way to implement it would be for a client to ask the server for the oidc server url, sort of mimicking the browser flow where it's the job of the server to redirect.

Just sharing this with you to get initial feedback and to ensure we are moving in the right direction.

@redhatHameed Thanks, looks good so far. I'll leave the comments in the PR and let's continue there.

@dmartinol
Copy link
Contributor

dmartinol commented Jul 2, 2024

@tokoko we're defining the integration with authorization servers for both Feast servers and clients, and I believe we could have a common auth section on top of the store definition that can be used accordingly to the runtime mode (e.g., server and client of a specific feature), like:

auth:
    type: kubernetes

or:

auth:
    type: oidc
    server: 'http://0.0.0.0:8080'
    realm: 'poc'
    client-id: 'app'
    client-secret: 'mqAzX7zDalQ1a3BZRWs7Pi5JRqCq7h4z'
    username: 'username' # only for client mode
    password: 'password' # only for client mode

This way we can reuse the same configuration/Config class whatever the application that we are running and avoid having replicated Auth section in all of the interfaces, it will just be fs.get_auth() whatever the runtime. Thoughts?

adding @lokeshrangineni who's looking at the client side

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants