Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TriggerAuthentication with Azure Key Vault and azure-workload causes Panic/crash after upgrade to 2.9.0 #4010

Closed
pinkfloydx33 opened this issue Dec 13, 2022 · 8 comments · Fixed by #4017
Labels
bug Something isn't working

Comments

@pinkfloydx33
Copy link

pinkfloydx33 commented Dec 13, 2022

Report

We are using Azure Workload Identity as our podIdentity provider. We have a TriggerAuthentication that uses podIdentity and Azure Key Vault. Prior to upgrading to 2.9.0 everything worked fine. Now, KEDA operator crashes with the below.

It seems like the upgrade has stopped KEDA from recognizing the spec.podIdentity settings of a TriggerAuthentication when communicating with Azure Key Vault, despite having worked before. KEDA 2.9.0 introduced a new spec.azureKeyVault.podIdentity field (I don't quite understand why) and the operator attempts to use, even when nil. At the very least I'd expect the operator to fall back to the pre-existing settings in spec.podIdentity.

2022-12-13T14:45:56Z    INFO    Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"administration-worker-queue-trigger","namespace":"dev-eastus"}, "namespace": "dev-eastus", "name": "administration-worker-queue-trigger", "reconcileID": "72537d7d-4612-4286-9677-8df8338f1ad9"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x2dcb1ff]

goroutine 451 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118 +0x1f4
panic({0x32e88e0, 0x6026750})
        /usr/local/go/src/runtime/panic.go:838 +0x207
github.com/kedacore/keda/v2/pkg/scaling/resolver.(*AzureKeyVaultHandler).getAuthConfig(0xc0010ba758?, {0x40ba360?, 0xc00102e120?}, {0x40c80b0?, 0xc000114550?}, {{0x40c0b20?, 0xc000554990?}, 0x0?}, {0xc00100fe10, 0xa}, ...)
        /workspace/pkg/scaling/resolver/azure_keyvault_handler.go:112 +0x13f
github.com/kedacore/keda/v2/pkg/scaling/resolver.(*AzureKeyVaultHandler).Initialize(0xc0010ba758, {0x40ba360, 0xc00102e120}, {0x40c80b0, 0xc000114550}, {{0x40c0b20?, 0xc000554990?}, 0xa?}, {0xc00100fe10, 0xa}, ...)
        /workspace/pkg/scaling/resolver/azure_keyvault_handler.go:52 +0x115
github.com/kedacore/keda/v2/pkg/scaling/resolver.resolveAuthRef({0x40ba360, 0xc00102e120}, {0x40c80b0, 0xc000114550}, {{0x40c0b20?, 0xc000554990?}, 0xc0010ba900?}, 0xc000354860, 0xc0010da3e8, {0xc00100fe10, ...}, ...)
        /workspace/pkg/scaling/resolver/scale_resolvers.go:238 +0x4da
github.com/kedacore/keda/v2/pkg/scaling/resolver.ResolveAuthRefAndPodIdentity({0x40ba360, 0xc00102e120}, {0x40c80b0?, 0xc000114550?}, {{0x40c0b20?, 0xc000554990?}, 0x40c0b20?}, 0xc000554990?, 0xc0010da300, {0xc00100fe10, ...}, ...)
        /workspace/pkg/scaling/resolver/scale_resolvers.go:155 +0xc9
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).buildScalers.func1()
        /workspace/pkg/scaling/scale_handler.go:526 +0x3e5
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).buildScalers(0xc000df9dd0, {0x40ba360?, 0xc00102e120}, 0xc000fdcdc0, 0xc0010da300, {0x0, 0x0})
        /workspace/pkg/scaling/scale_handler.go:536 +0x5c5
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).performGetScalersCache(0xc000df9dd0, {0x40ba360, 0xc00102e120}, {0xc000f28e40, 0x3b}, {0x386ec80, 0xc00102a400}, 0x3, {0x0, 0x0}, ...)
        /workspace/pkg/scaling/scale_handler.go:266 +0x757
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).GetScalersCache(0xc000e24f98?, {0x40ba360, 0xc00102e120}, {0x386ec80, 0xc00102a400})
        /workspace/pkg/scaling/scale_handler.go:190 +0xe6
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).getScaledObjectMetricSpecs(0xc000de6660, {0x40ba360, 0xc00102e120}, {{0x40c0b20?, 0xc00102e150?}, 0xc000fa0930?}, 0xc00102a400)
        /workspace/controllers/keda/hpa.go:200 +0x8c
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).newHPAForScaledObject(0xc000de6660, {0x40ba360?, 0xc00102e120?}, {{0x40c0b20?, 0xc00102e150?}, 0x38134c0?}, 0xc00102a400, 0xc0010bb608)
        /workspace/controllers/keda/hpa.go:74 +0x66
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).updateHPAIfNeeded(0xc000de6660, {0x40ba360, 0xc00102e120}, {{0x40c0b20?, 0xc00102e150?}, 0xc00102e120?}, 0xc00102a400, 0xc000f7efc0, 0xc000fa0990?)
        /workspace/controllers/keda/hpa.go:152 +0x7b
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).ensureHPAForScaledObjectExists(0xc000de6660, {0x40ba360, 0xc00102e120}, {{0x40c0b20?, 0xc00102e150?}, 0x40c0b20?}, 0xc00102a400, 0x0?)
        /workspace/controllers/keda/scaledobject_controller.go:427 +0x238
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).reconcileScaledObject(0xc000de6660?, {0x40ba360, 0xc00102e120}, {{0x40c0b20?, 0xc00102e150?}, 0xc000f36f00?}, 0xc00102a400)
        /workspace/controllers/keda/scaledobject_controller.go:230 +0x1c9
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).Reconcile(0xc000de6660, {0x40ba360, 0xc00102e120}, {{{0xc000a0a450?, 0x10?}, {0xc000f36f00?, 0x40d787?}}})
        /workspace/controllers/keda/scaledobject_controller.go:176 +0x526
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x40ba2b8?, {0x40ba360?, 0xc00102e120?}, {{{0xc000a0a450?, 0x370a2e0?}, {0xc000f36f00?, 0xc000d1d5d0?}}})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121 +0xc8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000aa92c0, {0x40ba2b8, 0xc000137640}, {0x342e3c0?, 0xc000429140?})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320 +0x33c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000aa92c0, {0x40ba2b8, 0xc000137640})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:230 +0x325

Expected Behavior

TriggerAuthentication with Azure Keyvault and the azure-workload pod-identity provider works without issue (especially following an upgrade). The TriggerAuthentication specification now contains two podIdentity properties; at the very least it should fallback to the spec.podIdentity field when absent.

Actual Behavior

KEDA Operator crashes with the above panic

Steps to Reproduce the Problem

This requires Azure Workload Identity, a Key Vault with secrets, an identity with access to the vault and a federated credential created on the Identity for keda/keda-operator service account. This example uses a Redis scaler and pulls values from Key Vault

Trigger Authentication:

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: trigger-auth
  namespace: my-namespace
spec:
  podIdentity:
    identityId: 'our-identity-clientid-here'
    provider: azure-workload
  azureKeyVault:
    secrets:
      - name: dev-redis-server-fqdn
        parameter: host
      - name: dev-redis-password
        parameter: password
    vaultUri: https://our-vault.vault.azure.net/

Scaler:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: my-scaled-object
  namespace: my-namespace
spec:
  cooldownPeriod: 150
  maxReplicaCount: 10
  minReplicaCount: 0
  pollingInterval: 30
  scaleTargetRef:
    name: my-deployment
  triggers:
    - authenticationRef:
        name: trigger-auth
      metadata:
        databaseIndex: '0'
        enableTLS: 'true'
        listLength: '5'
        listName: q:ListA
        port: '6380'
      type: redis
    - authenticationRef:
        name: trigger-auth
      metadata:
        databaseIndex: '0'
        enableTLS: 'true'
        listLength: '10'
        listName: q:ListB
        port: '6380'
      type: redis

KEDA Version

2.9.0

Kubernetes Version

1.24

Platform

Microsoft Azure

Scaler Details

Redis, among others

Anything else?

This appears to have been introduced here: #3814

The CRDS/configuration were updated and now contain two podIdentity properties. The pre-existing one is a top-level property of the spec (spec.podIdentity), while the new one is a child of the Azure Key Vault settings spec.azureKeyVault.podIdentity. If I move the podIdentity configuration from the top-level to under azureKeyVault everything now works:

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: trigger-auth
  namespace: my-namespace
spec:
  # podIdentity: # was here!! NO LONGER WORKS!
    # identityId: '...'
    # provider: azure-workload
  azureKeyVault:
    podIdentity: # now a sub-property. THIS WORKS?
      identityId: 'our-identity-clientid-here'
      provider: azure-workload
    secrets:
      - name: dev-redis-server-fqdn
        parameter: host
      - name: dev-redis-password
        parameter: password
    vaultUri: https://our-vault.vault.azure.net/

The spec.podIdentity property has existed forever... it received updates (for azure workload) as part of 2.8.0 (#2907), though there were some issues with the chart-bundled CRDs and Docs being misaligned and needing backporting.

I checked the changelog/Readme and didn't see anything about this being a breaking change. I still don't quite understand why there are two different properties serving the same purpose... at the very least it seems one should fall back to the other. If this is in fact intentional, can someone explain which property you are meant to use under what circumstances? I suppose it might make sense if you needed one identity for Key Vault and another for connecting to the target resource;. Being able to specify them separately could be useful in that scenario, but the operator should fall back to the "nearest" settings that exist--or document as a breaking change!

You can see it still exists at the top-level:

// TriggerAuthenticationSpec defines the various ways to authenticate
type TriggerAuthenticationSpec struct {
// +optional
PodIdentity *AuthPodIdentity `json:"podIdentity,omitempty"`

But it also exists as part of the AzureKeyVault settings:

type AzureKeyVault struct {
VaultURI string `json:"vaultUri"`
Secrets []AzureKeyVaultSecret `json:"secrets"`
// +optional
Credentials *AzureKeyVaultCredentials `json:"credentials"`
// +optional
PodIdentity *AuthPodIdentity `json:"podIdentity"`

And in the CRDS:


@pinkfloydx33 pinkfloydx33 added the bug Something isn't working label Dec 13, 2022
@v-shenoy
Copy link
Contributor

#3813

The reason for the change is that there's no problem in the key-vault itself using the spec.podIdentity field. However, there was a race-condition whether a scaler is using pod identity to access the event source or the secrets pulled from the key vault. This is not an issue for non-Azure scalers, as they will only be using the keyvault secrets. But for Azure scalers it is.

@JorTurFer

@JorTurFer
Copy link
Member

JorTurFer commented Dec 14, 2022

As @v-shenoy said, there were a race condition in Azure Scalers and we move the section as you have said. I guess that the problem is that you were using Azure Key Vault secret provider with pod identity, right?
If that is the case, first I'd like to apologize for the issues, but I'd like to clarify that the breaking change wasn't documented because there wasn't any breaking change, not officially at least. The official documentation for previous version says clearly that pod identity providers aren't supported for that secret provider:
image

The support was added to the code but never was officially released, you saw it browsing the code or just by luck, but not in the documentation. Once again, sorry for the inconvenience generated, but a change of something not officially documented and released isn't a breaking change, at least IMO

@JorTurFer
Copy link
Member

In v2.9 the support for pod identities has been added, documented and released, and now we will maintain it as part of our deprecation policy

@zroubalik
Copy link
Member

We should fix the code anyway, to not panic.

@pinkfloydx33
Copy link
Author

pinkfloydx33 commented Dec 14, 2022

Hrmm... I don't remember ever seeing the line about it not being supported. I did at the time browse the code, etc. so perhaps I picked it up from there (?). There also were a few issues with the docs at the time and the current versions weren't in sync with the updates and I think were outdated in general so it's possible that warning just wasn't yet there in the right spot.

So to be clear:

  • top-level property is for scalers only
  • vault-level property is for the vault only
  • In my case I'm not using an azure scaler so I should be putting it on the vault property only

If someone is using both, it seems a little weird to require both properties be set, especially if they are the same thing. I've no problem moving my settings, but is there any reason it can't fall-back to the spec-level property if the vault-specific one is not specified?

And yes--at the very least it shouldn't panic :-)

Technically-breaking change or not, it did work in some circumstances. A little note about the fact it now definitely won't couldn't hurt...

@JorTurFer
Copy link
Member

We should fix the code anyway, to not panic.

yes, we should

@JorTurFer
Copy link
Member

If someone is using both, it seems a little weird to require both properties be set, especially if they are the same thing. I've no problem moving my settings, but is there any reason it can't fall-back to the spec-level property if the vault-specific one is not specified?

The main reason IMO is that the podIdentity here is the mechanism to authenticate the Key Vault, and not the scaler, that's why it makes sense to me. As we have a section for client info inside the section because it's an authentication method, we have another section inside to pod identity. But I'm not the owner of the truth, WDYT @kedacore/keda-core-contributors ?

@pinkfloydx33
Copy link
Author

I can agree there, but I feel like it might be a common case to be using the same identity to authenticate both scenarios.

Perhaps specifying the vault-level pod-identity property without the clientId could fall back to the top/scaler-level clientId (if specified) and then otherwise to the SA-level one? Then you're at least saying "hey I want podIdentity for key vault" but not repeating the clientId if they're going to be the same.

Though...I suppose that falls apart if you want it to fall back to the SA's clientId and not the scaler's. OK, so I guess I can see the need to be explicit.

/shrug... Sorry, thinking out loud here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants