Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI hangs at sts:GetCallerIdentity when temporary AWS credentials exist but are expired #814

Closed
austinbutler opened this issue Dec 3, 2019 · 12 comments
Assignees
Labels
kind/bug Some behavior is incorrect or out of spec resolution/fixed This issue was fixed
Milestone

Comments

@austinbutler
Copy link

Where there are no AWS access keys in my shell environment, running pulumi up pretty much immediately errors out with error: unable to discover AWS AccessKeyID and/or SecretAccessKey. But when AWS access keys exist but had expired, it just hangs (for at least several minutes) at trying sts:GetCallerIdentity from what I can tell from the debug output (which I have now lost 😨 ), however without debug on it just looks like it's doing normal planning forever. Ideally it quickly determines the credentials are invalid and informs the user.

Pulumi: v1.6.1
Pulumi AWS: v1.13.0

@austinbutler
Copy link
Author

Let the token expire overnight to get the output:

> $ pulumi preview -d
Enter your passphrase to unlock config/secrets
    (set PULUMI_CONFIG_PASSPHRASE to remember):
Previewing update (staging):

     Type                 Name                Plan       Info
 +   pulumi:pulumi:Stack  pulumi-staging  create     17 debugs
     └─ aws:s3:Bucket     newbucket              1 error

System Messages
  ^C received; cancelling. If you would like to terminate immediately, press ^C again.
  ^C received; terminating

Diagnostics:
  aws:s3:Bucket (newbucket):
    error: transport is closing

  pulumi:pulumi:Stack (pulumi-staging):
    debug: Registering resource: t=pulumi:pulumi:Stack, name=pulumi-staging, custom=false
    debug: RegisterResource RPC prepared: t=pulumi:pulumi:Stack, name=pulumi-staging
    debug: RegisterResource RPC finished: resource:pulumi-staging[pulumi:pulumi:Stack]; err: null, resp: urn:pulumi:staging::pulumi::pulumi:pulumi:Stack::pulumi-staging,,,,
    debug: Running program '/Users/abutler/Documents/pulumi' in pwd '/Users/abutler/Documents/pulumi' w/ args:
    debug: Registering resource: t=aws:s3/bucket:Bucket, name=newbucket, custom=true
    debug: RegisterResourceOutputs RPC prepared: urn=urn:pulumi:staging::pulumi::pulumi:pulumi:Stack::pulumi-staging
    debug: RegisterResource RPC prepared: t=aws:s3/bucket:Bucket, name=newbucket
    debug: RegisterResourceOutputs RPC finished: urn=urn:pulumi:staging::pulumi::pulumi:pulumi:Stack::pulumi-staging; err: null, resp:
    debug: Setting AWS metadata API timeout to 100ms
    debug: Ignoring AWS metadata API endpoint at default location as it doesn't return any instance-id
    debug: AWS Auth provider used: "EnvProvider"
    debug: No assume_role block read from configuration
    debug: Building AWS auth structure
    debug: Setting AWS metadata API timeout to 100ms
    debug: Ignoring AWS metadata API endpoint at default location as it doesn't return any instance-id
    debug: AWS Auth provider used: "EnvProvider"
    debug: Trying to get account information via sts:GetCallerIdentity

error: an error occurred while advancing the preview

@pgavlin
Copy link
Member

pgavlin commented Dec 4, 2019

Yikes! Thanks for the report. @stack72 can you take a look?

@lukehoban
Copy link
Member

I just hit something very similar myself. A stack that I have deployed many time before, I came back to after awhile and I can no longer update it - pulumi up hangs at this:

I0121 21:15:04.384759   56334 eventsink.go:60] AWS Auth provider used: "SharedCredentialsProvider"
I0121 21:15:04.385134   56334 eventsink.go:60] Attempting to AssumeRole arn:aws:iam::058607598222:role/OrganizationAccountAccessRole (SessionName: "", ExternalId: "", Policy: "")
I0121 21:15:04.972779   56334 eventsink.go:60] Trying to get account information via sts:GetCallerIdentity

I am using:

const awsProvider = new aws.Provider("testing", {
    region: region,
    assumeRole: {
        roleArn: "arn:aws:iam::058607598222:role/OrganizationAccountAccessRole",
    },
});

From outside pulumi - the following runs and immediately returns succesfully:

aws sts assume-role --role-arn arn:aws:iam::058607598222:role/OrganizationAccountAccessRole --role-session-name something

@borisbsu
Copy link

Is there any workaround for this issue?

@lukehoban
Copy link
Member

@borisbsu Do you have details on your repro case for this? I believe some issues which have symptoms similar to this may actually be unrelated to sts:GetCallerIdentity itself. What is the case where you are seeing an issue?

@julienvincent
Copy link

julienvincent commented Mar 10, 2020

@borisbsu After playing around a bit the workaround I found was to:

  • downgrade @pulumi/aws to 1.21.0
  • export current stack to file
  • remove all secrets (access key, secret key, token) from aws provider states
  • import updated stack file

This allowed me to continue.

Makes me think this is related to #890

@lukehoban lukehoban added kind/bug Some behavior is incorrect or out of spec priority/P1 labels Mar 10, 2020
@lukehoban
Copy link
Member

lukehoban commented Mar 11, 2020

Across the notes here, in pulumi/pulumi#3604, and in #873, it appears there are a few related but independent things going on:

Together, these three compound. We can fix the latter two, but the core initial issue is an upstream bug that appears to be considered "by design". We will need to look into whether we can/should change the default behaviour for that in the Pulumi provider.

lukehoban added a commit to pulumi/pulumi that referenced this issue Mar 11, 2020
The changes in #4004 caused old provider configuration to be used even when a provider was different between inputs and outputs, in the case that the diff returned DiffUnkown.

To better handle that case, we compute a more accurate (but still conservative) DiffNone or DiffSome so that we can ensure we conservatively update to a new provider when needed, but retain the performance benefit of not creating and configuring a new provider as much as possible.

Part of pulumi/pulumi-aws#814.
lukehoban added a commit that referenced this issue Mar 11, 2020
In #874 we added config defaults from environment variables for four new configuration settings.  Thees config defaults are used in two places: (1) `aws.config` and (2) the defaults for `aws.Provider`.  For (1) these changes were a good thing, but for (2) they led to values from the environment getting baked into checkpoints that should not be.

It's not clear to me that we should be doing (2) at all - that is - I don't think `region` or `profile` should be picked up from the environment and baked into the checkpoint file either.  But for now we'll just revert the more recent change here which has led to the more significant immediate issue.

Part of #814.
lukehoban added a commit to pulumi/pulumi that referenced this issue Mar 11, 2020
The changes in #4004 caused old provider configuration to be used even when a provider was different between inputs and outputs, in the case that the diff returned DiffUnkown.

To better handle that case, we compute a more accurate (but still conservative) DiffNone or DiffSome so that we can ensure we conservatively update to a new provider when needed, but retain the performance benefit of not creating and configuring a new provider as much as possible.

Part of pulumi/pulumi-aws#814.
lukehoban added a commit that referenced this issue Mar 11, 2020
In #874 we added config defaults from environment variables for four new configuration settings.  Thees config defaults are used in two places: (1) `aws.config` and (2) the defaults for `aws.Provider`.  For (1) these changes were a good thing, but for (2) they led to values from the environment getting baked into checkpoints that should not be.

It's not clear to me that we should be doing (2) at all - that is - I don't think `region` or `profile` should be picked up from the environment and baked into the checkpoint file either.  But for now we'll just revert the more recent change here which has led to the more significant immediate issue.

Part of #814.

Fixes #890.
@lukehoban
Copy link
Member

The second two issues above have been fixed, which will reduce the likelihood of hitting this for unexpected reasons.

The core behaviour of hanging on expired credentials is due to upstream provider behavior - as tracked in hashicorp/terraform-provider-aws#1351, hashicorp/terraform-provider-aws#4502, hashicorp/terraform-provider-aws#9601 and hashicorp/terraform-provider-aws#12023.

We are considering on diverging on some defaults which may ultimately impact this in #873. I'll close this issue out for now, and further improvements will be tracked in upstream provider issues and #873.

@lukehoban lukehoban reopened this Apr 15, 2021
@lukehoban
Copy link
Member

Re-opening as this is still an issue that Pulumi users hit somewhat regularly, and we will likely want to find a way to workaround the upstream issues here.

@farvour
Copy link

farvour commented May 26, 2021

FWIW, we are using the aws-okta-processor tool and when cached credentials aren't available this issue occurs. The underlying issue, so it seems if another process, such as through credential_process is "awaiting input", Pulumi will simply hang instead of pass the STDIN/prompt to the user. In our case, it was sitting and waiting for the user's Okta credentials to have the aws credentials move forward with the profile.

It might be worth having Pulumi also check during run to make sure there isn't any input expectation on the terminal when calling out

It's probably worth calling this special use-case out in its own issue but I'll let guys decide. Ultimately pulumi should just not hang forever on this provider. I'd even rather have the tool spit back an error right if terminal input is blocking the finalization of credential retrieval. Ideally, it would just pass the input through to the interactive terminal and I could enter my password for Okta and move on. For now, I have to "prime" the profile being used by the provider, using say, aws --profile pulumi-test sts get-caller-identity

@cowlabs-xyz
Copy link

cowlabs-xyz commented Feb 15, 2023

I'm currently getting a hang when doing pulumi up that seems similar to this thread. On debug the last event in log during preview phase is
I0215 20:02:54.584016 17040 log.go:71] Marshaling property for RPC[ResourceMonitor.Invoke(aws:index/getCallerIdentity:getCallerIdentity)]:....

It seems to be related to an s3 bucket. When i remove this from the configuration it is able to proceed with the preview completely.

The steps I have taken to try to get passed this:

  1. Update pulumi to latest
  2. Update aws cli to latest
  3. Create new aws access token and aws configure
  4. pulumi config set aws:skipRequestingAccountId true
  5. pulumi config set aws:skipMetadataApiCheck truepulumi config set aws:skipCredentialsValidation true
  6. pulumi refresh <- runs to completion
  7. aws sts get-caller-identity <- returns as expected OK
  8. rm ~/.aws/credentials
  9. checked I have not conflicting env vars for aws tokens
  10. pulumi logout and login
  11. Able to create s3 bucket directly using aws s3api through cli
  12. deleted and reinstalled pulumi aws plugin
  13. created a completely new AWS user
  14. export and re-import stack

Are there any tips for further debug or actions to get past this stuck deployment?

@lukehoban lukehoban added the resolution/fixed This issue was fixed label Jun 24, 2023
@lukehoban
Copy link
Member

This is fixed now, via hashicorp/aws-sdk-go-base#362.

@lukehoban lukehoban added this to the 0.85 milestone Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Some behavior is incorrect or out of spec resolution/fixed This issue was fixed
Projects
None yet
Development

No branches or pull requests

8 participants