CLI hangs at sts:GetCallerIdentity when temporary AWS credentials exist but are expired #814

austinbutler · 2019-12-03T22:14:52Z

Where there are no AWS access keys in my shell environment, running pulumi up pretty much immediately errors out with error: unable to discover AWS AccessKeyID and/or SecretAccessKey. But when AWS access keys exist but had expired, it just hangs (for at least several minutes) at trying sts:GetCallerIdentity from what I can tell from the debug output (which I have now lost 😨 ), however without debug on it just looks like it's doing normal planning forever. Ideally it quickly determines the credentials are invalid and informs the user.

Pulumi: v1.6.1
Pulumi AWS: v1.13.0

The text was updated successfully, but these errors were encountered:

austinbutler · 2019-12-04T17:44:19Z

Let the token expire overnight to get the output:

> $ pulumi preview -d
Enter your passphrase to unlock config/secrets
    (set PULUMI_CONFIG_PASSPHRASE to remember):
Previewing update (staging):

     Type                 Name                Plan       Info
 +   pulumi:pulumi:Stack  pulumi-staging  create     17 debugs
     └─ aws:s3:Bucket     newbucket              1 error

System Messages
  ^C received; cancelling. If you would like to terminate immediately, press ^C again.
  ^C received; terminating

Diagnostics:
  aws:s3:Bucket (newbucket):
    error: transport is closing

  pulumi:pulumi:Stack (pulumi-staging):
    debug: Registering resource: t=pulumi:pulumi:Stack, name=pulumi-staging, custom=false
    debug: RegisterResource RPC prepared: t=pulumi:pulumi:Stack, name=pulumi-staging
    debug: RegisterResource RPC finished: resource:pulumi-staging[pulumi:pulumi:Stack]; err: null, resp: urn:pulumi:staging::pulumi::pulumi:pulumi:Stack::pulumi-staging,,,,
    debug: Running program '/Users/abutler/Documents/pulumi' in pwd '/Users/abutler/Documents/pulumi' w/ args:
    debug: Registering resource: t=aws:s3/bucket:Bucket, name=newbucket, custom=true
    debug: RegisterResourceOutputs RPC prepared: urn=urn:pulumi:staging::pulumi::pulumi:pulumi:Stack::pulumi-staging
    debug: RegisterResource RPC prepared: t=aws:s3/bucket:Bucket, name=newbucket
    debug: RegisterResourceOutputs RPC finished: urn=urn:pulumi:staging::pulumi::pulumi:pulumi:Stack::pulumi-staging; err: null, resp:
    debug: Setting AWS metadata API timeout to 100ms
    debug: Ignoring AWS metadata API endpoint at default location as it doesn't return any instance-id
    debug: AWS Auth provider used: "EnvProvider"
    debug: No assume_role block read from configuration
    debug: Building AWS auth structure
    debug: Setting AWS metadata API timeout to 100ms
    debug: Ignoring AWS metadata API endpoint at default location as it doesn't return any instance-id
    debug: AWS Auth provider used: "EnvProvider"
    debug: Trying to get account information via sts:GetCallerIdentity

error: an error occurred while advancing the preview

pgavlin · 2019-12-04T23:44:52Z

Yikes! Thanks for the report. @stack72 can you take a look?

lukehoban · 2020-01-22T05:29:28Z

I just hit something very similar myself. A stack that I have deployed many time before, I came back to after awhile and I can no longer update it - pulumi up hangs at this:

I0121 21:15:04.384759   56334 eventsink.go:60] AWS Auth provider used: "SharedCredentialsProvider"
I0121 21:15:04.385134   56334 eventsink.go:60] Attempting to AssumeRole arn:aws:iam::058607598222:role/OrganizationAccountAccessRole (SessionName: "", ExternalId: "", Policy: "")
I0121 21:15:04.972779   56334 eventsink.go:60] Trying to get account information via sts:GetCallerIdentity

I am using:

const awsProvider = new aws.Provider("testing", {
    region: region,
    assumeRole: {
        roleArn: "arn:aws:iam::058607598222:role/OrganizationAccountAccessRole",
    },
});

From outside pulumi - the following runs and immediately returns succesfully:

aws sts assume-role --role-arn arn:aws:iam::058607598222:role/OrganizationAccountAccessRole --role-session-name something

borisbsu · 2020-02-27T12:15:47Z

Is there any workaround for this issue?

lukehoban · 2020-03-01T22:00:56Z

@borisbsu Do you have details on your repro case for this? I believe some issues which have symptoms similar to this may actually be unrelated to sts:GetCallerIdentity itself. What is the case where you are seeing an issue?

julienvincent · 2020-03-10T15:19:43Z

@borisbsu After playing around a bit the workaround I found was to:

downgrade @pulumi/aws to 1.21.0
export current stack to file
remove all secrets (access key, secret key, token) from aws provider states
import updated stack file

This allowed me to continue.

Makes me think this is related to #890

lukehoban · 2020-03-11T00:25:29Z

Across the notes here, in pulumi/pulumi#3604, and in #873, it appears there are a few related but independent things going on:

The upstream Terraform AWS provider fundamentally will hang (or at least retry with exponential backoff for a very long time) when credentials are expired (Expired STS token results in terraform to hang hashicorp/terraform-provider-aws#1351)
The changes in Avoid configuring providers twice during preview pulumi#4004 led to a regression where old provider configuration for 1st class provider would be used during preview instead of new configuration. This can make the case above worse, as old expired credentials get used when they shouldn't. (Ensure new provider is registered when provider diff is unknown pulumi#4051)
The changes in Read all AWS env vars #874 led to some additional environment variables being written into the statefile that shouldn't have been - accentuating the above. (Avoid baking config from environment into checkpoint #894)

Together, these three compound. We can fix the latter two, but the core initial issue is an upstream bug that appears to be considered "by design". We will need to look into whether we can/should change the default behaviour for that in the Pulumi provider.

The changes in #4004 caused old provider configuration to be used even when a provider was different between inputs and outputs, in the case that the diff returned DiffUnkown. To better handle that case, we compute a more accurate (but still conservative) DiffNone or DiffSome so that we can ensure we conservatively update to a new provider when needed, but retain the performance benefit of not creating and configuring a new provider as much as possible. Part of pulumi/pulumi-aws#814.

In #874 we added config defaults from environment variables for four new configuration settings. Thees config defaults are used in two places: (1) `aws.config` and (2) the defaults for `aws.Provider`. For (1) these changes were a good thing, but for (2) they led to values from the environment getting baked into checkpoints that should not be. It's not clear to me that we should be doing (2) at all - that is - I don't think `region` or `profile` should be picked up from the environment and baked into the checkpoint file either. But for now we'll just revert the more recent change here which has led to the more significant immediate issue. Part of #814.

The changes in #4004 caused old provider configuration to be used even when a provider was different between inputs and outputs, in the case that the diff returned DiffUnkown. To better handle that case, we compute a more accurate (but still conservative) DiffNone or DiffSome so that we can ensure we conservatively update to a new provider when needed, but retain the performance benefit of not creating and configuring a new provider as much as possible. Part of pulumi/pulumi-aws#814.

In #874 we added config defaults from environment variables for four new configuration settings. Thees config defaults are used in two places: (1) `aws.config` and (2) the defaults for `aws.Provider`. For (1) these changes were a good thing, but for (2) they led to values from the environment getting baked into checkpoints that should not be. It's not clear to me that we should be doing (2) at all - that is - I don't think `region` or `profile` should be picked up from the environment and baked into the checkpoint file either. But for now we'll just revert the more recent change here which has led to the more significant immediate issue. Part of #814. Fixes #890.

lukehoban · 2020-03-12T03:44:30Z

The second two issues above have been fixed, which will reduce the likelihood of hitting this for unexpected reasons.

The core behaviour of hanging on expired credentials is due to upstream provider behavior - as tracked in hashicorp/terraform-provider-aws#1351, hashicorp/terraform-provider-aws#4502, hashicorp/terraform-provider-aws#9601 and hashicorp/terraform-provider-aws#12023.

We are considering on diverging on some defaults which may ultimately impact this in #873. I'll close this issue out for now, and further improvements will be tracked in upstream provider issues and #873.

lukehoban · 2021-04-15T20:58:37Z

Re-opening as this is still an issue that Pulumi users hit somewhat regularly, and we will likely want to find a way to workaround the upstream issues here.

farvour · 2021-05-26T19:17:34Z

FWIW, we are using the aws-okta-processor tool and when cached credentials aren't available this issue occurs. The underlying issue, so it seems if another process, such as through credential_process is "awaiting input", Pulumi will simply hang instead of pass the STDIN/prompt to the user. In our case, it was sitting and waiting for the user's Okta credentials to have the aws credentials move forward with the profile.

It might be worth having Pulumi also check during run to make sure there isn't any input expectation on the terminal when calling out

It's probably worth calling this special use-case out in its own issue but I'll let guys decide. Ultimately pulumi should just not hang forever on this provider. I'd even rather have the tool spit back an error right if terminal input is blocking the finalization of credential retrieval. Ideally, it would just pass the input through to the interactive terminal and I could enter my password for Okta and move on. For now, I have to "prime" the profile being used by the provider, using say, aws --profile pulumi-test sts get-caller-identity

cowlabs-xyz · 2023-02-15T19:11:34Z

I'm currently getting a hang when doing pulumi up that seems similar to this thread. On debug the last event in log during preview phase is
I0215 20:02:54.584016 17040 log.go:71] Marshaling property for RPC[ResourceMonitor.Invoke(aws:index/getCallerIdentity:getCallerIdentity)]:....

It seems to be related to an s3 bucket. When i remove this from the configuration it is able to proceed with the preview completely.

The steps I have taken to try to get passed this:

Update pulumi to latest
Update aws cli to latest
Create new aws access token and aws configure
pulumi config set aws:skipRequestingAccountId true
pulumi config set aws:skipMetadataApiCheck truepulumi config set aws:skipCredentialsValidation true
pulumi refresh <- runs to completion
aws sts get-caller-identity <- returns as expected OK
rm ~/.aws/credentials
checked I have not conflicting env vars for aws tokens
pulumi logout and login
Able to create s3 bucket directly using aws s3api through cli
deleted and reinstalled pulumi aws plugin
created a completely new AWS user
export and re-import stack

Are there any tips for further debug or actions to get past this stuck deployment?

lukehoban · 2023-06-24T15:11:36Z

This is fixed now, via hashicorp/aws-sdk-go-base#362.

pgavlin assigned stack72 Dec 4, 2019

lukehoban added this to the 0.32 milestone Jan 22, 2020

lukehoban mentioned this issue Jan 22, 2020

Pulumi Preview Hanging Indefinitely on `sts:GetCallerIdentity: pulumi/pulumi#3604

Closed

joeduffy mentioned this issue Feb 12, 2020

AWS metadata and STS API calls are slow #873

Closed

lukehoban assigned lukehoban and unassigned stack72 Feb 25, 2020

lukehoban modified the milestones: 0.32, 0.33 Feb 25, 2020

lukehoban added kind/bug Some behavior is incorrect or out of spec priority/P1 labels Mar 10, 2020

lukehoban mentioned this issue Mar 11, 2020

Ensure new provider is registered when provider diff is unknown pulumi/pulumi#4051

Merged

lukehoban mentioned this issue Mar 11, 2020

Avoid baking config from environment into checkpoint #894

Merged

lukehoban closed this as completed Mar 12, 2020

lukehoban reopened this Apr 15, 2021

leezen removed the priority/P1 label Apr 15, 2021

leezen removed this from the 0.33 milestone Apr 15, 2021

t0yv0 mentioned this issue May 17, 2021

Hanging preview phase of pulumi up awaiting a Docker command pulumi/pulumi#7074

Closed

jaxxstorm mentioned this issue Mar 3, 2022

Handling of short term configuration tokens pulumi/pulumi#9107

Open

lukehoban added the resolution/fixed This issue was fixed label Jun 24, 2023

lukehoban closed this as completed Jun 24, 2023

lukehoban added this to the 0.85 milestone Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI hangs at sts:GetCallerIdentity when temporary AWS credentials exist but are expired #814

CLI hangs at sts:GetCallerIdentity when temporary AWS credentials exist but are expired #814

austinbutler commented Dec 3, 2019

austinbutler commented Dec 4, 2019

pgavlin commented Dec 4, 2019

lukehoban commented Jan 22, 2020

borisbsu commented Feb 27, 2020

lukehoban commented Mar 1, 2020

julienvincent commented Mar 10, 2020 •

edited

Loading

lukehoban commented Mar 11, 2020 •

edited

Loading

lukehoban commented Mar 12, 2020

lukehoban commented Apr 15, 2021

farvour commented May 26, 2021

cowlabs-xyz commented Feb 15, 2023 •

edited

Loading

lukehoban commented Jun 24, 2023

CLI hangs at sts:GetCallerIdentity when temporary AWS credentials exist but are expired #814

CLI hangs at sts:GetCallerIdentity when temporary AWS credentials exist but are expired #814

Comments

austinbutler commented Dec 3, 2019

austinbutler commented Dec 4, 2019

pgavlin commented Dec 4, 2019

lukehoban commented Jan 22, 2020

borisbsu commented Feb 27, 2020

lukehoban commented Mar 1, 2020

julienvincent commented Mar 10, 2020 • edited Loading

lukehoban commented Mar 11, 2020 • edited Loading

lukehoban commented Mar 12, 2020

lukehoban commented Apr 15, 2021

farvour commented May 26, 2021

cowlabs-xyz commented Feb 15, 2023 • edited Loading

lukehoban commented Jun 24, 2023

julienvincent commented Mar 10, 2020 •

edited

Loading

lukehoban commented Mar 11, 2020 •

edited

Loading

cowlabs-xyz commented Feb 15, 2023 •

edited

Loading