Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(aws-docs): Add section on attaching policies to the datahub-actions pod #4334

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 56 additions & 2 deletions docs/deploy/aws.md
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,7 @@ in datahub to point to the specific ES instance -
quickstart files located [here](../../docker/quickstart/).
1. Once you have modified the quickstart recipes you can run the quickstart command using a specific docker compose
file. Sample command for that is
- `datahub docker quickstart --quickstart-compose-file docker/quickstart/docker-compose-without-neo4j.quickstart.yml`
- `datahub docker quickstart --quickstart-compose-file docker/quickstart/docker-compose-without-neo4j.quickstart.yml`
2. If you are not using quickstart recipes, you can modify environment variable in GMS to point to the ES instance. The
env files for datahub-gms are located [here](../../docker/datahub-gms/env/).

Expand Down Expand Up @@ -413,4 +413,58 @@ Run `helm upgrade --install datahub datahub/datahub --values values.yaml` to app
Note, you will be seeing log "Schema Version Id is null. Trying to register the schema" on every request. This log is
misleading, so should be ignored. Schemas are cached, so it does not register a new version on every request (aka no
performance issues). This has been fixed by [this PR](https://github.com/awslabs/aws-glue-schema-registry/pull/64) but
the code has not been released yet. We will update version once a new release is out.
the code has not been released yet. We will update version once a new release is out.

### IAM policies for UI-based ingestion

This section details how to attach policies to the acryl-datahub-actions pod that powers UI-based ingestion. For some of
the ingestion recipes, you sepecify login creds in the recipe itself, making it easy to set up auth to grab metadata
from the data source. However, for AWS resources, the recommendation is to use IAM roles and policies to gate requests
to access metadata on these resources.

To do this, let's follow
this [guide](https://docs.aws.amazon.com/eks/latest/userguide/create-service-account-iam-policy-and-role.html) to
associate a kubernetes service account with an IAM role. Then we can attach this IAM role to the acryl-datahub-actions
pod to let the pod assume the specified role.

First, you must create an IAM policy with all the permissions needed to run ingestion. This is specific to each
connector and the set of metadata you are trying to pull. i.e. profiling requires more permissions, since it needs
access to the data, not just the metadata. Let's say assume the ARN of that policy
is `arn:aws:iam::<<account-id>>:policy/policy1`.

Then, create a service account with the policy attached is to use [eksctl](https://eksctl.io/). You can run the
following command to do so.

```
eksctl create iamserviceaccount \
--name <<service-account-name>> \
--namespace <<namespace>> \
--cluster <<eks-cluster-name>> \
--attach-policy-arn <<policy-ARN>> \
--approve \
--override-existing-serviceaccounts
```

For example, running the following will create a service account "acryl-datahub-actions" in the datahub namespace of
datahub EKS cluster with `arn:aws:iam::<<account-id>>:policy/policy1` attached.

```
eksctl create iamserviceaccount \
--name acryl-datahub-actions \
--namespace datahub \
--cluster datahub \
--attach-policy-arn arn:aws:iam::<<account-id>>:policy/policy1 \
--approve \
--override-existing-serviceaccounts
```

Lastly, in the helm values.yaml, you can add the following to the acryl-datahub-actions to attach the service account to
the acryl-datahub-actions pod.

```yaml
acryl-datahub-actions:
enabled: true
serviceAccount:
name: <<service-account-name>>
...
```
3 changes: 3 additions & 0 deletions docs/ui-ingestion.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,9 @@ There are valid cases for ingesting metadata without the UI-based ingestion sche
- Your ingestion source requires context from a local filesystem (e.g. input files, environment variables, etc)
- You want to distribute metadata ingestion among multiple producers / environments

### How do I attach policies to the actions pod to give it permissions to pull metadata from various sources?

This varies across the underlying platform. For AWS, please refer to this [guide](./deploy/aws.md#iam-policies-for-ui-based-ingestion).

## Demo

Expand Down