Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using kubelet identity to access ACR OCI charts #1071

Closed
gldraphael opened this issue Apr 12, 2023 · 21 comments · Fixed by fluxcd/pkg#560 or #1105
Closed

Using kubelet identity to access ACR OCI charts #1071

gldraphael opened this issue Apr 12, 2023 · 21 comments · Fixed by fluxcd/pkg#560 or #1105

Comments

@gldraphael
Copy link

I created a test cluster exp-aks-02 with the following configuration:

Kubernetes Version: 1.25.6
Authentication and Authorization: Azure AD authentication with Kubernetes RBAC
Network Plugin: Azure CNI

(The cluster does not use the ACR integration.)

I then went ahead and bootstrapped flux, and assigned ACR Pull and Reader permissions to the User Assigned Managed Identity exp-aks-02-agentpool on a ACR instance.

At this point, I expected it to just work, but flux get sources would show this error:

unknown build error: failed to get credential from azure: DefaultAzureCredential: failed to acquire a token.
Attempted credentials:
        EnvironmentCredential: missing environment variable AZURE_TENANT_ID
        ManagedIdentityCredential: no default identity is assigned to this resource
        AzureCLICredential: Azure CLI not found on path

Ideas?


Other Observations

Fetching token by specifying the UAI to use

I followed the thread at #898 and concluded the reason this happens is because I have two UAIs (User Assigned managed Identities) attached to this cluster (exp-aks-02-agentpool and aciconnectorlinux-exp-aks-02).

So I tried patching the flux-system kustomization to add AZURE_CLIENT_ID:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - gotk-components.yaml
  - gotk-sync.yaml
labels:
  - pairs:
      toolkit.fluxcd.io/tenant: sre-team
patches:
  - patch: |
      - op: add
        path: /spec/template/spec/containers/0/args/-
        value: --concurrent=20
      - op: add
        path: /spec/template/spec/containers/0/args/-
        value: --requeue-dependency=5s
    target:
      kind: Deployment
      name: "(kustomize-controller|helm-controller|source-controller)"
  - patch: |
      - op: add
        path: /spec/template/spec/containers/0/env/-
        value:
          name: AZURE_CLIENT_ID
          value: --client-id--
    target:
      kind: Deployment
      name: "(helm-controller|source-controller)"

But I now see this error (which almost feels like a bug):

unknown build error: failed to get credential from azure: error exchanging token: failed to decode the response: invalid character '<' looking for beginning of value

However hitting the token API directly works as long as I include the client_id parameter:

$ kubectl exec -it source-controller-59b5c97495-htrtb -n flux-system -- /bin/sh
$ wget -q -O - "http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https://management.azure.com/&client_id=$AZURE_CLIENT_ID" --header "Metadata: true"
{"access_token":"--redacted--","client_id":"--client-id--","expires_in":"84928","expires_on":"1681412609","ext_expires_in":"86399","not_before":"1681325909","resource":"https://management.azure.com/","token_type":"Bearer"}

akv2k8s works ok

I am able to consume secrets from azure keyvault using the akv2k8s project which appears to use the userAssignedIdentityID value from /etc/kubernetes/azure.json:

apiVersion: v1
kind: Namespace
metadata:
  name: akv2k8s
  labels:
    toolkit.fluxcd.io/tenant: sre-team
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
  name: akv2k8s
  namespace: akv2k8s
spec:
  interval: 60m0s
  url: https://charts.spvapi.no
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: akv2k8s
  namespace: akv2k8s
spec:
  interval: 60m
  chart:
    spec:
      chart: akv2k8s
      version: "2.*"
      sourceRef:
        kind: HelmRepository
        name: akv2k8s
      interval: 12h
  values:
    global:
      metrics:
        enabled: true
---
apiVersion: spv.no/v1alpha1
kind: AzureKeyVaultSecret
metadata:
  name: test-credentials
  namespace: monitoring
spec:
  vault:
    name: vault-name
    object:
      type: multi-key-value-secret
      name: test-credentials
      contentType: application/x-json
  output:
    secret:
      name: test-credentials
@somtochiama
Copy link
Member

Which version of flux are you on? You can run flux version to check?
I want to see if I can reproduce this error on my end, so any more details on how you set up the cluster would be great. We have e2e tests for kubelet identity (but the cluster uses system identity)

@gldraphael
Copy link
Author

gldraphael commented Apr 13, 2023

Flux version returns this:

~ ❯  flux version
flux: v0.41.2
helm-controller: v0.31.2
kustomize-controller: v0.35.1
notification-controller: v0.33.0
source-controller: v0.36.1

I created the cluster from the azure portal but I'm happy to put together a terraform script if that will help.

Edit:

I think to reproduce this, the cluster should use Azure AD, and the cluster should have more than one User Assigned Managed Identity. I'm validating this assumption right now.

@gldraphael
Copy link
Author

gldraphael commented Apr 13, 2023

I created a new cluster with a single User Assigned Managed Identity (UAI):

Node pools
Node pools 1
Enable virtual nodes Disabled

Access
Resource identity: System-assigned managed identity
Local accounts: Disabled
Authentication and Authorization: Azure AD authentication with Kubernetes RBAC
Cluster admin group: Cluster Admin
Encryption type: (Default) Encryption at-rest with a platform-managed key

Networking
Network configuration: Kubenet
Load balancer: Standard
Private cluster: Disabled
Authorized IP ranges: Disabled
Network policy: None

Integrations
Container registry: None
Microsoft Defender for Cloud: Free
Enable Container Logs: Disabled
Alerts: Not enabled
Azure Policy: Disabled

And I see the following error (which is similar to what I saw when I set AZURE_CLIENT_ID in the previous cluster with more than one UAI):

failed to get credential from azure: error exchanging token: failed to decode the response: invalid character '<' looking for beginning of value

Seems like this truly is a bug. Let me know if you have trouble reproducing this.

@somtochiama
Copy link
Member

somtochiama commented May 11, 2023

Hey, Sorry for the long wait. I just tested this on the latest version and it worked okay:

fleet-infra git:(main) flux -v
flux version 2.0.0-rc.2

I created an AKS cluster with the following properties (as stated in the previous comment)
Screenshot 2023-05-11 at 11 01 15

I assigned an AcrPull role to the cluster's managed identity and it reconciled successfully.
Next, I added a second managed identity to the cluster and it failed to reconcile (which is expected):

► annotating OCIRepository podinfo in flux-system namespace
✔ OCIRepository annotated
◎ waiting for OCIRepository reconciliation
✗ OCIRepository reconciliation failed: 'failed to get credential from azure: DefaultAzureCredential: failed to acquire a token.
Attempted credentials:
        EnvironmentCredential: missing environment variable AZURE_TENANT_ID
        WorkloadIdentityCredential: missing environment variables for workload identity. Check webhook and pod configuration
        ManagedIdentityCredential: no default identity is assigned to this resource
        AzureCLICredential: Azure CLI not found on path

Then I added the AZURE_CLIENT_ID env variable to the source-controller pod and it reconciled successfully.

Can you try upgrading to 2.0.0-rc.2

@gldraphael
Copy link
Author

Thanks for testing this out @somtochiama

I just tested it with v2.0.0-rc.3 but still see the same error unfortunately:

failed to get credential from azure: error exchanging token: failed to decode the response: invalid character '<' looking for beginning of value

I will try again on Monday just to be certain.

@gldraphael
Copy link
Author

I am still seeing the same error. I see it when I add the following source:

apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
  name: myacr
  namespace: experiments
spec:
  type: oci
  provider: azure
  url: oci://myacr.azurecr.io
  interval: 5m

Are you able to reproduce this?

@somtochiama
Copy link
Member

I was testing using OCIRepository instead of HelmRepository. I will try again today

@somtochiama
Copy link
Member

Hey @gldraphael ,

I have been able to reproduce this. Can you try specifying the repository in the URL i.e

spec:
  type: oci
  provider: azure
  url: oci://myacr.azurecr.io/<repo-name>

@gldraphael
Copy link
Author

Well, that kinda works, but not quite. My chart is at oci://myacr.azurecr.io/clippy. Not at oci://myacr.azurecr.io/charts/clippy.

Earlier, I tried:

apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
  name: myacr
  namespace: experiments
spec:
  type: oci
  provider: azure
  url: oci://myacr.azurecr.io
  interval: 5m
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: clippy
  namespace: experiments
spec:
  releaseName: clippy
  chart:
    spec:
      chart: clippy
      sourceRef:
        kind: HelmRepository
        name: myacr
      version: 1.0.1
  interval: 50m
  install:
    remediation:
      retries: 3
  values: {}

And that shows the error I reported earlier.

Now, I tried the following:

apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
  name: clippy
  namespace: experiments
spec:
  type: oci
  provider: azure
  url: oci://myacr.azurecr.io/clippy
  interval: 5m
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: clippy
  namespace: experiments
spec:
  releaseName: clippy
  chart:
    spec:
      chart: clippy
      sourceRef:
        kind: HelmRepository
        name: clippy
        namespace: experiments
      version: 1.0.1
  interval: 50m
  install:
    remediation:
      retries: 3
  values: {}

I see no errors on the HelmRepository anymore, but the HelmChart shows the following error:

 ~/flux/flux get source chart experiments-clippy -n experiments
NAME                    REVISION        SUSPENDED       READY   MESSAGE

experiments-clippy                      False           False   chart pull error: failed to download chart for remote reference: failed to get 'oci://myacr.azurecr.io/clippy/clippy:1.0.1': myacr.azurecr.io/clippy/clippy:1.0.1: not found

It appears to be trying to get the chart from the wrong place: myacr.azurecr.io/clippy/clippy:1.0.1 instead of myacr.azurecr.io/clippy:1.0.1

I think a possible workaround may be to move my chart to myacr.azurecr.io/charts/clippy:1.0.1.

What I do not understand is why I no longer see any error on the HelmRepository when I use oci://myacr.azurecr.io/clippy as opposed to oci://myacr.azurecr.io. Does that URL always expect a base path after the origin?

@somtochiama
Copy link
Member

somtochiama commented May 16, 2023

I think a possible workaround may be to move my chart to myacr.azurecr.io/charts/clippy:1.0.1.

Yes, you would have to use this as a workaround while I get this fixed.

The HelmRepository should work with the repository root address but right now there's a bug that prevents it from doing so. When exchanging the token, it makes a request to index.docker.io due to some defaulting in a library we use.
Thanks for reporting this!

@gldraphael
Copy link
Author

Ah! Feel free let me know if you'd like me to test anything.
Appreciate your patience here!

@somtochiama
Copy link
Member

somtochiama commented May 26, 2023

@gldraphael This issue will be fixed in the latest release of flux

@gldraphael
Copy link
Author

@somtochiama - I tested this out, it works! Thanks!

@joshuadmatthews
Copy link

@gldraphael any advice for this when using the flux extension? I can't get this working either without setting the ClientID somehow, but because I'm using the flux extension there doesn't seem to be a way to cleanly patch the source controller manifests.

@gldraphael
Copy link
Author

@joshuadmatthews - I have never used the Azure Flux extensions. I think the best thing to do would be to ask Azure Support if you haven't already. Their extensions should be covered, if I'm not mistaken. Let us know what they say here!

But since you asked for my advice, I'd say avoid the extensions as far as you can!

@joshuadmatthews
Copy link

I was able to get it working by deployed a patch with kubectl, which allows me to target a resource versus a manifest. It would be nice if flux had a way to apply patches directly versus having to patch a yaml file that is also in source control.

@stefanprodan
Copy link
Member

@joshuadmatthews Flux can patch existing objects in-cluster, but being a GitOps tool, the patch must be specified in source control. Here is an example: https://fluxcd.io/flux/faq/#how-to-patch-coredns-and-other-pre-installed-addons

Also please note that we don't offer support for Azure extensions, you need to raise the ACR auth issue with Microsoft support. When installing Flux using flux bootstrap here is now you can set the ClientID: https://fluxcd.io/flux/installation/configuration/workload-identity/#azure-workload-identity

@joshuadmatthews
Copy link

Thanks @stefanprodan, good to know there is a method to match resources that weren’t originally added by flux.

With the Azure extensions, I did eventually find a document that described how to configure the extensions to setup workload identity.

@gxy12280421
Copy link

@joshuadmatthews Did you get it working with the Azure flux-extension? Can you share the document about configuring the extension to setup workload identity?

I am using the Azure flux-extension and having the issue to authenticate to ACR with kubelet identity.

"error":"failed to get credential from 'azure': DefaultAzureCredential: failed to acquire a token.\nAttempted credentials:\n\tEnvironmentCredential: missing environment variable AZURE_TENANT_ID\n\tWorkloadIdentityCredential: no client ID specified. Check pod configuration or set ClientID in the options\n\tManagedIdentityCredential: failed to authenticate a system assigned identity. The endpoint responded with {"error":"invalid_request","error_description":"Multiple user assigned identities exist, please specify the clientId / resourceId of the identity in the token request"}\n\tAzureCLICredential: Azure CLI not found on path\n\tAzureDeveloperCLICredential: Azure Developer CLI not found on path"

@joshuadmatthews
Copy link

@gxy12280421 see the Workload Identity section here

https://learn.microsoft.com/en-us/azure/azure-arc/kubernetes/tutorial-use-gitops-flux2?tabs=azure-cli

az k8s-extension create --resource-group <resource_group_name> --cluster-name <aks_cluster_name> --cluster-type managedClusters --name flux --extension-type microsoft.flux --config workloadIdentity.enable=true workloadIdentity.azureClientId=<user_assigned_client_id

You can do an update instead of a create if you already installed flux with Bicep/ARM.

@gxy12280421
Copy link

@joshuadmatthews Thank you very much for the quick info which pointed me to the right direction. I got it working by adding useKubeletIdentity = "true" in the Azure flux extension since I assigned the ACRPull permission on the kubelet identity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants