Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(azure): add support for workload identity using azidentity #3906

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 133 additions & 1 deletion docs/tutorials/azure.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ The following fields are used:
* `aadClientID` and `aadClientSecret` are associated with the Service Principal. This is only used with Service Principal method documented in the next section.
* `useManagedIdentityExtension` - this is set to `true` if you use either AKS Kubelet Identity or AAD Pod Identities methods documented in the next section.
* `userAssignedIdentityID` - this contains the client id from the Managed identitty when using the AAD Pod Identities method documented in the next setion.
* `useWorkloadIdentityExtension` - this is set to `true` if you use Workload Identity method documented in the next section.

The Azure DNS provider expects, by default, that the configuration file is at `/etc/kubernetes/azure.json`. This can be overridden with the `--azure-config-file` option when starting ExternalDNS.

Expand All @@ -63,6 +64,7 @@ ExternalDNS needs permissions to make changes to the Azure DNS zone. There are t
- [Service Principal](#service-principal)
- [Managed Identity Using AKS Kubelet Identity](#managed-identity-using-aks-kubelet-identity)
- [Managed Identity Using AAD Pod Identities](#managed-identity-using-aad-pod-identities)
- [Managed Identity Using Workload Identity](#managed-identity-using-workload-identity)

### Service Principal

Expand Down Expand Up @@ -319,6 +321,136 @@ kubectl patch deployment external-dns --namespace "default" --patch \
'{"spec": {"template": {"metadata": {"labels": {"aadpodidbinding": "external-dns"}}}}}'
```

### Managed identity using Workload Identity

For this process, we will create a [managed identity](https://docs.microsoft.com//azure/active-directory/managed-identities-azure-resources/overview) that will be explicitly used by the ExternalDNS container. This process is somewhat similar to Pod Identity except that this managed identity is associated with a kubernetes service account.

#### Deploy OIDC issuer and Workload Identity services

Update your cluster to install [OIDC Issuer](https://learn.microsoft.com/en-us/azure/aks/use-oidc-issuer) and [Workload Identity](https://learn.microsoft.com/en-us/azure/aks/workload-identity-deploy-cluster):

```bash
$ AZURE_AKS_RESOURCE_GROUP="my-aks-cluster-group" # name of resource group where aks cluster was created
$ AZURE_AKS_CLUSTER_NAME="my-aks-cluster" # name of aks cluster previously created

$ az aks update --resource-group ${AZURE_AKS_RESOURCE_GROUP} --name ${AZURE_AKS_CLUSTER_NAME} --enable-oidc-issuer --enable-workload-identity
```

#### Create a managed identity

Create a managed identity:

```bash
$ IDENTITY_RESOURCE_GROUP=$AZURE_AKS_RESOURCE_GROUP # custom group or reuse AKS group
$ IDENTITY_NAME="example-com-identity"

# create a managed identity
$ az identity create --resource-group "${IDENTITY_RESOURCE_GROUP}" --name "${IDENTITY_NAME}"
```

#### Assign a role to the managed identity

Grant access to Azure DNS zone for the managed identity:

```bash
$ AZURE_DNS_ZONE_RESOURCE_GROUP="MyDnsResourceGroup" # name of resource group where dns zone is hosted
$ AZURE_DNS_ZONE="example.com" # DNS zone name like example.com or sub.example.com

# fetch identity client id from managed identity created earlier
$ IDENTITY_CLIENT_ID=$(az identity show --resource-group "${IDENTITY_RESOURCE_GROUP}" \
--name "${IDENTITY_NAME}" --query "clientId" --output tsv)
# fetch DNS id used to grant access to the managed identity
$ DNS_ID=$(az network dns zone show --name "${AZURE_DNS_ZONE}" \
--resource-group "${AZURE_DNS_ZONE_RESOURCE_GROUP}" --query "id" --output tsv)
$ RESOURCE_GROUP_ID=$(az group show --name "${AZURE_DNS_ZONE_RESOURCE_GROUP}" --query "id" --output tsv)

$ az role assignment create --role "DNS Zone Contributor" \
--assignee "${IDENTITY_CLIENT_ID}" --scope "${DNS_ID}"
$ az role assignment create --role "Reader" \
--assignee "${IDENTITY_CLIENT_ID}" --scope "${RESOURCE_GROUP_ID}"
```

#### Create a federated identity credential

A binding between the managed identity and the ExternalDNS service account needs to be setup by creating a federated identity resource:

```bash
$ OIDC_ISSUER_URL="$(az aks show -n myAKSCluster -g myResourceGroup --query "oidcIssuerProfile.issuerUrl" -otsv)"

$ az identity federated-credential create --name ${IDENTITY_NAME} --identity-name ${IDENTITY_NAME} --resource-group $AZURE_AKS_RESOURCE_GROUP} --issuer "$OIDC_ISSUER_URL" --subject "system:serviceaccount:default:external-dns"
```

NOTE: make sure federated credential refers to correct namespace and service account (`system:serviceaccount:<NAMESPACE>:<SERVICE_ACCOUNT>`)

#### helm

When deploying external-dns with helm, here are the parameters you need to pass:

```yaml
fullnameOverride: external-dns

serviceAccount:
annotations:
azure.workload.identity/client-id: <IDENTITY_CLIENT_ID>

podLabels:
azure.workload.identity/use: "true"

provider: azure

secretConfiguration:
enabled: true
mountPath: "/etc/kubernetes/"
data:
azure.json: |
{
"subscriptionId": "<SUBSCRIPTION_ID>",
"resourceGroup": "<AZURE_DNS_ZONE_RESOURCE_GROUP>",
"useWorkloadIdentityExtension": true
}
```

NOTE: make sure the pod is restarted whenever you make a configuration change.

#### kubectl (alternative)

##### Create a configuration file for the managed identity

Create the file `azure.json` with the values from previous steps:

```bash
cat <<-EOF > /local/path/to/azure.json
{
"subscriptionId": "$(az account show --query id -o tsv)",
"resourceGroup": "$AZURE_DNS_ZONE_RESOURCE_GROUP",
"useWorkloadIdentityExtension": true
}
Comment on lines +423 to +427
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a pro at Azure, but wonder if it is possible to avoid this mount..

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the PR creator, but as long as Azure-workload-identity is installed in the Cluster, the Mutating Admission Webhook will take care of the projection based on just the podLabels & the service Account annotations.

So, perhaps the secretConfiguration section in the helm values can be done away with.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, this is what I am thinking about

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the mutating webhook inserts environment variables that help to get an access token:

image

Whereas Subscription ID (subscriptionId) and Resource Group (resourceGroup) are needed for Azure provider (in-tree plugin in external-dns) to understand where a DNS zone is maintained. It's used to construct proper API calls against ARM.

As for the last option called useWorkloadIdentityExtension, it's needed to instruct provider which authentication method to use. Strictly speaking, azidentity offers DefaultAzureCredential function that can automatically detect authentication method:

image

https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity#section-readme
But I don't think it should get enabled under this PR. So, this option is also required, at least for now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think DefaultAzureCredential should always be preferred and if it had been in most projects azwi would've been easier to get going. But I don't know enough about this project to judge if that change should or should not be made in this pr or as it's own change so if you think that's not appropriate you probably know better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weisdd I don't have a strong opinion here as I haven't used DefaultAzureCredential in the past and I'm not familiar with the tradeoffs that come with it. Can you please clarify pros and cons of the two approaches?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Raffo Here's what I can think of:

  • Pros:
    • Easy-to-use;
    • No need to explicitly maintain configuration settings related to any of the authentication methods (e.g. when I looked into how to add support for Workload Identity to FluxCD, it was a matter of bumping deps in those controllers that relied on DefaultAzureCredential);
  • Cons:
    • If your environment satisfies multiple authentication methods, you can't directly override the order of preference;
    • You can't use it in a complex multi-tenant scenario when a token is not mounted, but referenced (multiple MSIs delegate access to different k8s service-accounts, so the app needs to have a more complex flow for getting tokens. It's used it external-secrets operator, docs).

I think none of the cons are applicable to external-dns at this point:

  • A container environment is unlikely to satisfy multiple authentication methods (why would anyone configure all of them?);
  • external-dns currently works with just one resource group (so, you don't have complex multi-tenant setups in the same instance).

Order of preference

DefaultAzureCredential has this order of preference when it come to authentication methods:

  • Environment - DefaultAzureCredential will read account information specified via environment variables and use it to authenticate.
  • Workload Identity - If the app is deployed on Kubernetes with environment variables set by the workload identity webhook, DefaultAzureCredential will authenticate the configured identity.
  • Managed Identity - If the app is deployed to an Azure host with managed identity enabled, DefaultAzureCredential will authenticate with it.
  • Azure CLI - If a user or service principal has authenticated via the Azure CLI az login command, DefaultAzureCredential will authenticate that identity.

Source: https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity#section-readme

I think this order is well thought-out and is unlikely to cause any issues:

  • Managed Identity will soon be gone (or is already gone) in most clusters;
  • Azure CLI is quite heavy for a container image and will only be pre-installed in local development environments. Devs can make the library use an SPN through environment variables if there's any need for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the context, this is very helpful. If the switch to DefaultAzureCredential is trivial, then I wouldn't be opposed in having this in this PR. It would align us with other controllers and make this feature easier to adopt. At the end of the day, we are adding a new functionality, so it is not bad to make now the change that we need to make.
The only real consideration that I want to make is that I want to cut a release soon and I'm waiting for this one to merge. It's fine to do so, but if adding DefaultAzureCredential means waiting a lot of time or extensive testing due to lack of confidence in the change, then I'd prefer having this one in and handle the change in a follow up PR. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Raffo At a first glance, code-wise, migration to DefaultAzureCredentialt would be simple, something along the lines:

func getCredentials(cfg config) (azcore.TokenCredential, error) {
	cloudCfg, err := getCloudConfiguration(cfg.Cloud)
	if err != nil {
		return nil, fmt.Errorf("failed to get cloud configuration: %w", err)
	}

	opts := azidentity.DefaultAzureCredentialOptions{
		ClientOptions: azcore.ClientOptions{
			Cloud: cloudCfg,
		},
	}

	cred, err := azidentity.NewDefaultAzureCredential(&opts)
	if err != nil {
		return nil, fmt.Errorf("failed to obtain a token: %w", err)
	}

	return cred, nil
}

(+ remove old configuration fields here and there)

Testing is unlikely to be complex.

When it comes to documentation, it'll be more time-consuming:

  • We'll need to add some references to the docs for the new authentication logic and highlight important pieces there;
  • I think there are plenty of things to be simplified in the docs (including the part that I prepared in this PR), and we should probably share some code snippets (e.g. for terraform).

So, I'd prefer to go with the second option you suggested - to merge the current PR and then handle other changes in a follow-up PR. If I don't get busy at work, I'll do it this week.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me, thank you for your careful explanation and to be mindful to separate work appropriately, that is very appreciated.

EOF
```

Use the `azure.json` file to create a Kubernetes secret:

```bash
$ kubectl create secret generic azure-config-file --namespace "default" --from-file /local/path/to/azure.json
```

##### Update labels and annotations on ExternalDNS service account

To instruct Workload Identity webhook to inject a projected token into the ExternalDNS pod, the pod needs to have a label `azure.workload.identity/use: "true"` (before Workload Identity 1.0.0, this label was supposed to be set on the service account instead). Also, the service account needs to have an annotation `azure.workload.identity/client-id: <IDENTITY_CLIENT_ID>`:

To patch the existing serviceaccount and deployment, use the following command:

```bash
$ kubectl patch serviceaccount external-dns --namespace "default" --patch \
"{\"metadata\": {\"annotations\": {\"azure.workload.identity/client-id\": \"${IDENTITY_CLIENT_ID}\"}}}"
$ kubectl patch deployment external-dns --namespace "default" --patch \
'{"spec": {"template": {"metadata": {"labels": {\"azure.workload.identity/use\": \"true\"}}}}}'
```

NOTE: it's also possible to specify (or override) ClientID through `UserAssignedIdentityID` field in `azure.json`.
Copy link

@Sturgelose Sturgelose Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: according this line, I kinda understand that "if I do not set the ENV variables right through annotations, I can override/specify the clientId through the value in azure.json"

However, here, it feels it is the opposite: if the azure.json does not have the values, we default to the ENV variables.

I tried setting all the values via the azure.json and not using annotations but, the clientId is not fetching properly. May there be a problem in the config somewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the code, it seems the property to set is ClientID instead of UserAssignedIdentityID:

ClientID: cfg.ClientID,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my bad, we should correct either docs or code. Sorry for that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reply, I think this provides quite some insights on why it didn't work. So... which of both options do you prefer? Updating docs or the code?

I don't mind opening a PR, but I don't want to make a decision without an agreement on what we want.

From my side I wouldn't mind opening a PR to update the docs. Also I can test that ClientID and TenantID work as expected before reflecting that in the docs.

Copy link

@janpfischer janpfischer Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do encounter the same issue. For me it seems like the default way with annotation of client-id and pod label is not working. I get the error message that the client id is not present.
To set the clientId in the json file you'll have to set it in the key aadClientId and not in UserAssignedIdentityID like in the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sturgelose sorry for the delay in replies, intense period at work and some personal matters.
To be honest, not sure what's the best approach here. On one hand, changing code to UserAssignedIdentityID would make more sense in terms of namings. I think I used it in the initial implementation on the old library, but when preparing the new PR, somehow lost the fact that there are two ways to pass client ID in external-dns.
On the other, aadClientId is an "easier" change as it doesn't change behaviour.
At the same time, as the documentation is not fully accurate, it's unlikely that many people used overrides. By the way, could you expand on why you need to use explicit override through config instead of k8s annotations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my point of view, it's better to stick with clientID as it's the environment variable the mutating webhook will present it in the pod through AZURE_CLIENT_ID as described in https://azure.github.io/azure-workload-identity/docs/installation/mutating-admission-webhook.html#mutating-admission-webhook
More other, UserAssignedIdentityID was just used for the old pod identity feature wich is deprecated so once it will be removed, it will make a weird sitution where UserAssignedIdentityID is used to populate AZURE_CLIENT_ID. I think it makes more sense to use the same vocabulary everywhere.


NOTE: make sure the pod is restarted whenever you make a configuration change.

## Ingress used with ExternalDNS

This deployment assumes that you will be using nginx-ingress. When using nginx-ingress do not deploy it as a Daemon Set. This causes nginx-ingress to write the Cluster IP of the backend pods in the ingress status.loadbalancer.ip property which then has external-dns write the Cluster IP(s) in DNS vs. the nginx-ingress service external IP.
Expand Down Expand Up @@ -651,6 +783,6 @@ $ az group delete --name "MyDnsResourceGroup"

## More tutorials

A video explanantion is available here: https://www.youtube.com/watch?v=VSn6DPKIhM8&list=PLpbcUe4chE79sB7Jg7B4z3HytqUUEwcNE
A video explanantion is available here: https://www.youtube.com/watch?v=VSn6DPKIhM8&list=PLpbcUe4chE79sB7Jg7B4z3HytqUUEwcNE

![image](https://user-images.githubusercontent.com/6548359/235437721-87611869-75f2-4f32-bb35-9da585e46299.png)
43 changes: 34 additions & 9 deletions provider/azure/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,16 @@ import (

// config represents common config items for Azure DNS and Azure Private DNS
type config struct {
Cloud string `json:"cloud" yaml:"cloud"`
TenantID string `json:"tenantId" yaml:"tenantId"`
SubscriptionID string `json:"subscriptionId" yaml:"subscriptionId"`
ResourceGroup string `json:"resourceGroup" yaml:"resourceGroup"`
Location string `json:"location" yaml:"location"`
ClientID string `json:"aadClientId" yaml:"aadClientId"`
ClientSecret string `json:"aadClientSecret" yaml:"aadClientSecret"`
UseManagedIdentityExtension bool `json:"useManagedIdentityExtension" yaml:"useManagedIdentityExtension"`
UserAssignedIdentityID string `json:"userAssignedIdentityID" yaml:"userAssignedIdentityID"`
Cloud string `json:"cloud" yaml:"cloud"`
TenantID string `json:"tenantId" yaml:"tenantId"`
SubscriptionID string `json:"subscriptionId" yaml:"subscriptionId"`
ResourceGroup string `json:"resourceGroup" yaml:"resourceGroup"`
Location string `json:"location" yaml:"location"`
ClientID string `json:"aadClientId" yaml:"aadClientId"`
ClientSecret string `json:"aadClientSecret" yaml:"aadClientSecret"`
UseManagedIdentityExtension bool `json:"useManagedIdentityExtension" yaml:"useManagedIdentityExtension"`
UseWorkloadIdentityExtension bool `json:"useWorkloadIdentityExtension" yaml:"useWorkloadIdentityExtension"`
UserAssignedIdentityID string `json:"userAssignedIdentityID" yaml:"userAssignedIdentityID"`
}

func getConfig(configFile, resourceGroup, userAssignedIdentityClientID string) (*config, error) {
Expand Down Expand Up @@ -93,6 +94,30 @@ func getCredentials(cfg config) (azcore.TokenCredential, error) {
return cred, nil
}

// Try to retrieve token with Workload Identity.
if cfg.UseWorkloadIdentityExtension {
log.Info("Using workload identity extension to retrieve access token for Azure API.")

wiOpt := azidentity.WorkloadIdentityCredentialOptions{
ClientOptions: azcore.ClientOptions{
Cloud: cloudCfg,
},
// In a standard scenario, Client ID and Tenant ID are expected to be read from environment variables.
// Though, in certain cases, it might be important to have an option to override those (e.g. when AZURE_TENANT_ID is not set
// through a webhook or azure.workload.identity/client-id service account annotation is absent). When any of those values are
// empty in our config, they will automatically be read from environment variables by azidentity
TenantID: cfg.TenantID,
ClientID: cfg.ClientID,
}

cred, err := azidentity.NewWorkloadIdentityCredential(&wiOpt)
if err != nil {
return nil, fmt.Errorf("failed to create a workload identity token: %w", err)
}

return cred, nil
}

// Try to retrieve token with MSI.
if cfg.UseManagedIdentityExtension {
log.Info("Using managed identity extension to retrieve access token for Azure API.")
Expand Down
Loading