Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSAL and OIDC support #2320

Merged
merged 20 commits into from
Mar 28, 2023
Merged

MSAL and OIDC support #2320

merged 20 commits into from
Mar 28, 2023

Conversation

thomas11
Copy link
Contributor

@thomas11 thomas11 commented Mar 22, 2023

About

This PR contains two related pieces:

  1. Switch the provider from the deprecated ADAL authentication method to the current MSAL method. Due to an issue in our dependencies, one of the authentication paths, the az cli, doesn't work properly with MSAL. We detect the unsupported cli case dynamically and fall back to ADAL in that case.
  2. Enable OIDC for authentication. The actual OIDC implementation is in our dependencies but enabling it was gated behind MSAL. This PR exposes the relevant configuration settings and adds a dedicated test.

What's MSAL?

The provider currently uses an authentication method called ADAL that’s out of support in Azure. ADAL, also called auth v1, is replaced by MSAL (auth v2).

OIDC only works on MSAL.

These protocols are implemented in our dependencies go-azure-helpers and hamilton so we don't need to care about the details, except for two things that surface in our code:

  • MSAL has a new concept of scopes for more targeted authentication scopes. You can't just get a token for everything in an Azure account or subscription, you can only get one for ResourceManager, or KeyVault, etc. Since Pulumi deals with infrastructure, we can get away with ResourceManager for most operations, but we do have some custom resources that touch the data plane and need a different scope.
  • The dependencies don't implement MSAL correctly when the az cli is used for authentication. This PR falls back to ADAL when that case is detected.

What's OIDC?

It stands for OpenID Connect and is an authentication protocol on top of OAuth. The idea is to allow trusted parties to verify a user's identity. For a practical example, let's look at how I set up OIDC authentication between Github Actions and Azure, which is used by the e2e test of this PR. We tell Azure that GitHub can be trusted:

  1. On an Azure Active Directory (AD) App, add a new credential with issuer GitHub.
  2. In GitHub, configure the Action with the id of the AD App.

Now, when the action runs, GitHub can use the Azure AD endpoint to exchange the GitHub token for an Azure token.

Note that there is no secret involved anywhere that would need to be safely stored, rotated, etc. This is one of the biggest advantages of OIDC.

Details for the interested here (GH) and here (Azure), but not required reading for reviewing this PR.

Testing

I picked an existing e2e test, keyvault. In general, any e2e test will do since we only want to see that authentication works. However, keyvault has more authentication test coverage than the other tests because it also creates a key vault secret, which is a data plane operation that requires a different token.

I used Go build tags to make this test as oidc. This required moving it into its own file. In the OIDC test run, we run only this test, unsetting the ARM client secret to make sure we really test OIDC.

Reviewing this PR

I added comments with leading numbers like (1). I suggest following those in order. They show on the main page in the correct order, but without a link to the actual file, so that's kinda annoying.

Next steps

To fully release OIDC support we need to document it in our user-facing documentation.

I should also document the particular OIDC integration used in testing in our wiki (AD principal, roles, permissions etc.).

This not only limits the security blast radius, but also unblocks
schema-tools from adding PR comments. Previously, it had the write
permission by default, but adding the id-token permission setting
globally reset all others: "If you specify the access for any of these
scopes, all of those that are not specified are set to none."
@github-actions
Copy link

Does the PR have any schema changes?

Looking good! No breaking changes found.
No new resources/functions.

@@ -0,0 +1,293 @@
// Copyright 2016-2022, Pulumi Corporation.
Copy link
Contributor Author

@thomas11 thomas11 Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) Since the authentication code grew to about 300 lines, I moved it to its own file. Much of it is unchanged.

envName = "public"
}

useMsi := k.getConfig("useMsi", "ARM_USE_MSI") == "true"
Copy link
Contributor Author

@thomas11 thomas11 Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(2) Read the OIDC config from the environment. Lines 47-65, 80-85 are new. It's standard configuration handling, except that there are two environment variables that we check for token and URL. The ones starting with ARM_ are ours for the user to set, the ones with ACTIONS_ are pre-set when running in GHA, so we check them directly instead of making users get the values and put them into the ARM_ variables.

SupportsAuxiliaryTenants: len(auxTenants) > 0,
}

needCli := needAzCli(builder)
Copy link
Contributor Author

@thomas11 thomas11 Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(3) Check whether we'll use the az cli for authentication. That isn't new but I extracted it into the needAzCli function and added the OIDC case. Both OIDC and MSI auth are not supported via cli.

builder.ClientCertPath == ""
}

func getAzVersion() (*goversion.Version, error) {
Copy link
Contributor Author

@thomas11 thomas11 Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(4) The following code is all about cli versions and is just moved over from provider.go. It gets interesting again in line 190.

return nil
}

type AuthorizerFactory func(api environments.Api) (autorest.Authorizer, error)
Copy link
Contributor Author

@thomas11 thomas11 Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(5) Autorest, the library we use to talk to the Azure REST API, takes an autorest.Authorizer interface to get auth tokens from. With ADAL, we could just make an Authorizer and use it everywhere. With MSAL, due to the added scopes, we need a factory to create Authorizers on demand for different scopes. Note how the token factory is a func(api environments.Api) but in the ADAL case, the api parameter is unused.


endpoint := k.environment.TokenAudience

if authConfig.useCli {
Copy link
Contributor Author

@thomas11 thomas11 Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(6) As mentioned above, due to this issue we fall back to ADAL if we're using the cli.

The rest of auth.go is largely unchanged.

// need the authorizers to function across requests.
authCtx := context.Background()

tokenAuth, bearerAuth, err := k.makeAuthorizerFactories(authCtx, authConfig)
Copy link
Contributor Author

@thomas11 thomas11 Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(7) In the provider's Configure, we use the new authorizer factories to make authorizers for ResourceManager and for KeyVault. The RM one is our standard authorizer because we generally only do resource management: k.client.Authorizer = resourceManagerAuth. The Key Vault one is passed to the custom resources builder so we can handle secrets there.

The rest of provider.go is just code moved to auth.go.

if: ${{ matrix.language == 'nodejs' }}
env:
ARM_USE_OIDC: true
ARM_CLIENT_SECRET: ""
Copy link
Contributor Author

@thomas11 thomas11 Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(10) We unset ARM_CLIENT_SECRET that the other tests use for authentication to disable that path so we know we're testing what we think we're testing.

Copy link
Member

@danielrbradley danielrbradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks loads better - much happier it's not complicating the build process!

Good to go as-is but have added a comment for moving where the hard-coded value sits.

examples/examples_nodejs_test.go Show resolved Hide resolved
examples/examples_nodejs_keyvault_test.go Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact/no-changelog-required This issue doesn't require a CHANGELOG update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants