Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(auth): platform instance ownership in policies #8396

Conversation

amanda-her
Copy link
Contributor

@amanda-her amanda-her commented Jul 11, 2023

This PR introduces two fields for DataHubPolicyInfo: platformInstanceOwners and platformInstanceOwnersTypes. With this change authorization policies can be restricted to platform instance owners in case a Metadata asset belongs to a particular platform instance. By default platformInstanceOwners is set to false. When this flag is set to True, it is possible to specify which particular types (i.e. TECHNICAL_OWNER, BUSINESS_OWNER) of ownership will be considered for the policy with the platformInstanceOwnersTypes field. As a result, for example, the option of granting rights to change tags of datasets to only the owners of the platform instance they belong to is available. In addition, the type of this ownership can be specified.

Beside changes in the PolicyEngine and other parts of backend, also UI was amended to allow for new capability to be set as show below:
Captura de pantalla 2023-07-11 a las 15 24 08

Checklist

@github-actions github-actions bot added product PR or Issue related to the DataHub UI/UX devops PR or Issue related to DataHub backend & deployment labels Jul 11, 2023
@amanda-her amanda-her changed the title eat(auth): platform instance ownership in policies feat(auth): platform instance ownership in policies Jul 11, 2023
@@ -371,6 +365,29 @@ private boolean isRoleMatch(final Urn actor, final DataHubActorFilter actorFilte
.anyMatch(actorRoles::contains);
}

private Set<String> resolveEntityOwnersForType(Urn entityUrn, List<Urn> ownershipTypes) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is the same as getOwnersForType but renamed to align with the other ones (resolveRoles or resolveGroups) and the the first parameter becomes a Urn instead of a ResourceSpec for reusability.

@anshbansal anshbansal added the community-contribution PR or Issue raised by member(s) of DataHub Community label Jul 17, 2023
@jjoyce0510
Copy link
Collaborator

jjoyce0510 commented Jul 17, 2023

Would really urge you to consider the following approach.

Containers + Platform instances are containers, just like Glossary Term Groups.

Glossary Term Groups support a privilege called "Manage Children" which grants all access to the children groups + terms. During evaluation time, we check whether there is a parent term group for which the user has "Manage Children" permissions. If yes, we authorize actions against the child groups or terms.

I'd recommend a similar approach here to retain consistency. This will also make the default setup experience much simpler for the majority of users, most of which do not actively leverage the platform instance concept today. This will also reduce the scope of this PR.

If we continue down the path suggested by this PR, I can see this final panel exploding to support "Container" and "Container owner" use cases and "Domain" and "Domain owner" use cases as well, but I do not feel that this is the right place to declare controls for managing such groups of assets.

What do you think?

Cheers
John

@sgomezvillamor
Copy link
Contributor

Hi @jjoyce0510 for the feedback and for taking the time to review. I've been discussing with my team the comments here and in our private discussion. Also we have been analysing the code about glossary terms to compare the approach there with the one in this PR.

We have some concerns on the approach you are suggesting, considering the information we have at hand and our understanding. So please correct us if we are wrong. The concerns are:

  • "Manage Children" approach is too broad for Platform Instance (and Containers).

    Such a grant enables owners to manage all entities and aspects below in the hierarchy. While this may be valid for Glossary Node/Term hierarchy, this results too broad when considering the Platform Instance (and Containers) hierarchy. Actually, it's a little bit surprising that we have fine-grained capabilities for resource owners, such as "Add Terms", or "Edit Properties" to name a couple, whereas for Platform Instance (and Containers) owners it's all or nothing; as soon as I'm owner of the Platform Instance I can update any aspect or entities below in the hierarchy. Instead, we think that fine-grained metadata privileges should be supported for Platform Instance (and Containers) owners, which is the approach we follow in the PR.

  • Hard to scale for Containers.

    While we fully agree that the approach should be generalized as much as possible to accommodate other existing (such as Containers) and future (such as Domains, Data Product, ..) hierarchies, this may require to redesign how these hierarchies are indexed if we don't want to introduce some performance penalty in the authorization layer. The concern on performance was firstly mentioned in the review of our previous contrib: ownership type update on the Resource owners based policy feat(auth): Fine grained ownership policies #7499. So, when we faced this new PR we explicitly and exclusively support Platform Instance while keeping out Containers. Fetching Platform Instance from a Resource is a fixed and single extra hop that we need to resolve; so still the performance penalty introduced can be assumed. However, extending the feature to Containers would require to iteratively raise a request to go 1-level up in the hierarchy; so one request per level. Same issue can be found when traversing the Glossary hierarchy

    public static Urn getParentUrn(@Nonnull Urn urn, @Nonnull QueryContext context, @Nonnull EntityClient entityClient) {
    This approach is hard to scale if many levels and I'm not sure if we can make assumptions on the number of levels neither in the Glossary hierarchy nor the Container hierarchy. Proper indexation of this hierarchies may be required if we want to enable features based on such a hierarchy.

  • Concerns on the current implementation of Glossary Terms

    Also, and after having a look to the auth code in Glossary Terms we have found some concerns on the design. Note conclusions here are based on a quick code analysis and so may be wrong, please correct us if so. You know code much better than us.

    • First one is the performance penalty because of the resolution of the glossary hierarchy. Anyway, we may ignore this and assume number of levels is going to be low. 🤞
    • Biggest concern is on the authorization logic for Glossary Terms is mixed with the grapql resolvers. This can be noted here
      if (GlossaryUtils.canManageChildrenEntities(context, parentNode, _entityClient)) {
      where the GlossaryUtils.canManageChildrenEntities is called. This means that the authorization validation must be explicitly invoked from all glossary mutations. Following the same approach for Platform Instance and Containers is basically not possible (or very time-consuming and impacting many places), since the number of possible use cases (mutations) is much more bigger than in the case of Glossary Nodes/Terms.
    • The authorization logic for Glossary Nodes/Terms is placed in the GlossaryUtils.java, which is in the graphql package. Does this mean that authorization is only enabled for the GraphQL endpoints? What about ingestion pipeline? Is it skipping authorization for Glossary Nodes/Terms 🤔 Instead, PolicyEngine is the place (in metadata-service package) where all authorization logic should be so it's valid for all interfaces. Is this conclusion correct?

Sorry for the long comment 😅 Just trying to share our point of view. Hopefully we can reach a point that satisfies everybody.

The way we see this in the long term future is: Datasets Owner check button becomes Resource Owner check button, so this is valid for any entity and there is an additional check button for Enable below in the hierarchy (or whatever text you find more descriptive) so it applies for any children. However, this would require proper indexation of the hierarchies. That's why we skipped Containers and the Enable below in the hierarchy became Platform Instance owners in our PR. Instead we may set Enable below in the hierarchy and warn the user that, for the moment, only Platform Instance is supported.

Please, let us know if you find a way to unblock this. Looking forward to your response.

@jjoyce0510
Copy link
Collaborator

jjoyce0510 commented Jul 19, 2023

Responding here.

"Manage Children" approach is too broad for Platform Instance (and Containers).
Such a grant enables owners to manage all entities and aspects below in the hierarchy. While this may be valid for Glossary >Node/Term hierarchy, this results too broad when considering the Platform Instance (and Containers) hierarchy. Actually, >it's a little bit surprising that we have fine-grained capabilities for resource owners, such as "Add Terms", or "Edit >Properties" to name a couple, whereas for Platform Instance (and Containers) owners it's all or nothing; as soon as I'm >owner of the Platform Instance I can update any aspect or entities below in the hierarchy. Instead, we think that fine->grained metadata privileges should be supported for Platform Instance (and Containers) owners, which is the approach we >follow in the PR.

So the idea we took for Glossary Term Groups is to have BOTH a broad "Manage Children" permission, along with more granular permissions: Edit Children Tags, Edit Children Glossary Terms.... Etc, you can imagine listing out all the core entity-level permissions.

While we fully agree that the approach should be generalized as much as possible to accommodate other existing (such as >Containers) and future (such as Domains, Data Product, ..) hierarchies, this may require to redesign how these hierarchies >are indexed if we don't want to introduce some performance penalty in the authorization layer. The concern on >performance was firstly mentioned in the review of our previous contrib: ownership type update on the Resource owners >based policy #7499. So, when we faced this new PR we explicitly and >exclusively support Platform Instance while keeping out Containers. Fetching Platform Instance from a Resource is a fixed >and single extra hop that we need to resolve; so still the performance penalty introduced can be assumed. However, >extending the feature to Containers would require to iteratively raise a request to go 1-level up in the hierarchy; so one >request per level. Same issue can be found when traversing the Glossary hierarchy

Performance cost IS REAL concern. I think that we can of course improve how we do things by fetching container hierarchies, or using the new Browse V2 path that was recently added to make efficient fetching of all parent containers much easier. But I think we'll have to face this cost. Usually real-world container paths are luckily not that deep.

Final concern:

where the GlossaryUtils.canManageChildrenEntities is called. This means that the authorization validation must be >explicitly invoked from all glossary mutations. Following the same approach for Platform Instance and Containers is >basically not possible (or very time-consuming and impacting many places), since the number of possible use cases
(mutations) is much more bigger than in the case of Glossary Nodes/Terms.

We'll indeed need to update some resolvers to basically check whether the current actor has the ability to perform the action based on permissions associated with a parent container.. For example, in the addTags mutation we'd need to add a condition that checks all containers for whether there is the correct permission. I can provide a set of all the places, and help to implement this as well so that we don't miss any.

We've actually heard this use case around Containers, Platforms, Platform Instance management before.. and want to make sure that we make it as generic as possible.

The authorization logic for Glossary Nodes/Terms is placed in the GlossaryUtils.java, which is in the graphql package. Does > this mean that authorization is only enabled for the GraphQL endpoints? What about ingestion pipeline? Is it skipping
authorization for Glossary Nodes/Terms 🤔 Instead, PolicyEngine is the place (in metadata-service package) where all
authorization logic should be so it's valid for all interfaces. Is this conclusion correct?

This is CORRECT. GraphQL layer handles app-layer authorization. The current assumption is that the Ingestion Clients have full access to change the metadata graph.

Does this address the concerns? If I were implementing the feature today, this is how I'd build it!

@sgomezvillamor
Copy link
Contributor

So the idea we took for Glossary Term Groups is to have BOTH a broad "Manage Children" permission, along with more granular permissions: Edit Children Tags, Edit Children Glossary Terms.... Etc, you can imagine listing out all the core entity-level permissions.

I find this very redundant... from the point of view of the user as well as the implementation such approach requires.

Performance cost IS REAL concern. [...] Usually real-world container paths are luckily not that deep.

I'm ok to rely on such assumption.

This is CORRECT. GraphQL layer handles app-layer authorization. The current assumption is that the Ingestion Clients have full access to change the metadata graph.

TBH this was new for us and shocking 😅 We need to investigate the capabilities the new REST_API_AUTHORIZATION_ENABLED_ENV env var enables.

We've actually heard this use case around Containers, Platforms, Platform Instance management before.. and want to make sure that we make it as generic as possible.

We talk about the "Manage Children" permission, but actually, this hierarchy is more semantic than properly structured in the entity graph. Eg. Platform Instance is 1:1 relationship with Dataset and Container... but is not reachable from the Container hierarchy itself. The same for Platform. The same for Domain... is domain above or below a Platform or a Platform Instance? And we also have the new Data Products. Unless I'm missing something, the overall hierarchy is not so clear, much more closer to a graph and then we can have loops 😅

@sgomezvillamor
Copy link
Contributor

@jjoyce0510 As discussed yesterday:

  • while there is alignment on the utility of Platform Instance owner policies
  • we understand the concerns on still missing an implementation approach that can be easily scalable to other containers different from Platform Instance
  • so we have planned to unblock this with the use of Authorizers plugins instead

Of course, in parallel we can have discussions on how to eventually fix this in the upstream project 👍

@amanda-her You can close this PR once you are back 😄

@david-leifker
Copy link
Collaborator

Closing per previous comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community devops PR or Issue related to DataHub backend & deployment product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants